Dataset statistics
| Number of variables | 27 |
|---|---|
| Number of observations | 4,803 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 10.0 MiB |
| Average record size in memory | 2.1 KiB |
Variable types
| CAT | 19 |
|---|---|
| NUM | 6 |
| UNSUPPORTED | 2 |
budget has a high cardinality: 437 distinct values | High cardinality |
genres has a high cardinality: 1175 distinct values | High cardinality |
homepage has a high cardinality: 1692 distinct values | High cardinality |
plot_keywords has a high cardinality: 4222 distinct values | High cardinality |
original_title has a high cardinality: 4801 distinct values | High cardinality |
overview has a high cardinality: 4801 distinct values | High cardinality |
production_companies has a high cardinality: 3697 distinct values | High cardinality |
production_countries has a high cardinality: 469 distinct values | High cardinality |
release_date has a high cardinality: 3281 distinct values | High cardinality |
spoken_languages has a high cardinality: 544 distinct values | High cardinality |
tagline has a high cardinality: 3945 distinct values | High cardinality |
movie_title has a high cardinality: 4800 distinct values | High cardinality |
country has a high cardinality: 71 distinct values | High cardinality |
director_name has a high cardinality: 2350 distinct values | High cardinality |
actor_1_name has a high cardinality: 2721 distinct values | High cardinality |
actor_2_name has a high cardinality: 3096 distinct values | High cardinality |
actor_3_name has a high cardinality: 3373 distinct values | High cardinality |
original_title is uniformly distributed | Uniform |
overview is uniformly distributed | Uniform |
release_date is uniformly distributed | Uniform |
movie_title is uniformly distributed | Uniform |
Unnamed: 0 has unique values | Unique |
id has unique values | Unique |
duration is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
title_year is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
gross has 1427 (29.7%) zeros | Zeros |
vote_average has 63 (1.3%) zeros | Zeros |
num_voted_users has 62 (1.3%) zeros | Zeros |
Reproduction
| Analysis started | 2020-12-16 15:46:00.473061 |
|---|---|
| Analysis finished | 2020-12-16 15:46:19.333918 |
| Duration | 18.86 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 4803 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2401 |
|---|---|
| Minimum | 0 |
| Maximum | 4802 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 37.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 240.1 |
| Q1 | 1200.5 |
| median | 2401 |
| Q3 | 3601.5 |
| 95-th percentile | 4561.9 |
| Maximum | 4802 |
| Range | 4802 |
| Interquartile range (IQR) | 2401 |
Descriptive statistics
| Standard deviation | 1386.651002 |
|---|---|
| Coefficient of variation (CV) | 0.5775306129 |
| Kurtosis | -1.2 |
| Mean | 2401 |
| Median Absolute Deviation (MAD) | 1201 |
| Skewness | 0 |
| Sum | 11532003 |
| Variance | 1922801 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2047 | 1 | < 0.1% | |
| 2572 | 1 | < 0.1% | |
| 4623 | 1 | < 0.1% | |
| 2576 | 1 | < 0.1% | |
| 529 | 1 | < 0.1% | |
| 4627 | 1 | < 0.1% | |
| 2580 | 1 | < 0.1% | |
| 533 | 1 | < 0.1% | |
| 4631 | 1 | < 0.1% | |
| 2584 | 1 | < 0.1% | |
| 537 | 1 | < 0.1% | |
| 4635 | 1 | < 0.1% | |
| 2588 | 1 | < 0.1% | |
| 541 | 1 | < 0.1% | |
| 4639 | 1 | < 0.1% | |
| 2592 | 1 | < 0.1% | |
| 545 | 1 | < 0.1% | |
| 525 | 1 | < 0.1% | |
| 4619 | 1 | < 0.1% | |
| 2596 | 1 | < 0.1% | |
| 521 | 1 | < 0.1% | |
| 501 | 1 | < 0.1% | |
| 4599 | 1 | < 0.1% | |
| 2552 | 1 | < 0.1% | |
| 505 | 1 | < 0.1% | |
| Other values (4778) | 4778 | 99.5% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 5 | 1 | < 0.1% | |
| 6 | 1 | < 0.1% | |
| 7 | 1 | < 0.1% | |
| 8 | 1 | < 0.1% | |
| 9 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 4802 | 1 | < 0.1% | |
| 4801 | 1 | < 0.1% | |
| 4800 | 1 | < 0.1% | |
| 4799 | 1 | < 0.1% | |
| 4798 | 1 | < 0.1% | |
| 4797 | 1 | < 0.1% | |
| 4796 | 1 | < 0.1% | |
| 4795 | 1 | < 0.1% | |
| 4794 | 1 | < 0.1% | |
| 4793 | 1 | < 0.1% |
| Distinct | 437 |
|---|---|
| Distinct (%) | 9.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| 0 | |
|---|---|
| 20000000 | 144 |
| 30000000 | 128 |
| 25000000 | 126 |
| 40000000 | 123 |
| Other values (432) |
| Value | Count | Frequency (%) | |
| 0 | 1037 | 21.6% | |
| 20000000 | 144 | 3.0% | |
| 30000000 | 128 | 2.7% | |
| 25000000 | 126 | 2.6% | |
| 40000000 | 123 | 2.6% | |
| 15000000 | 120 | 2.5% | |
| 35000000 | 102 | 2.1% | |
| 50000000 | 101 | 2.1% | |
| 10000000 | 101 | 2.1% | |
| 60000000 | 86 | 1.8% | |
| 5000000 | 84 | 1.7% | |
| 12000000 | 79 | 1.6% | |
| 8000000 | 62 | 1.3% | |
| 70000000 | 60 | 1.2% | |
| 80000000 | 59 | 1.2% | |
| 18000000 | 59 | 1.2% | |
| 7000000 | 55 | 1.1% | |
| 6000000 | 55 | 1.1% | |
| 2000000 | 54 | 1.1% | |
| 45000000 | 52 | 1.1% | |
| 3000000 | 51 | 1.1% | |
| 4000000 | 49 | 1.0% | |
| 1000000 | 48 | 1.0% | |
| 75000000 | 47 | 1.0% | |
| 55000000 | 45 | 0.9% | |
| Other values (412) | 1876 | 39.1% |
Frequencies of value counts
Unique
| Unique | 232 ? |
|---|---|
| Unique (%) | 4.8% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 8 |
| Mean length | 6.271913387 |
| Min length | 1 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 0 | 23650 | 78.5% | |
| 5 | 1343 | 4.5% | |
| 1 | 1266 | 4.2% | |
| 2 | 1037 | 3.4% | |
| 3 | 720 | 2.4% | |
| 4 | 520 | 1.7% | |
| 6 | 469 | 1.6% | |
| 8 | 467 | 1.6% | |
| 7 | 426 | 1.4% | |
| 9 | 225 | 0.7% | |
| t | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 30123 | > 99.9% | |
| Lowercase Letter | 1 | < 0.1% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 23650 | 78.5% | |
| 5 | 1343 | 4.5% | |
| 1 | 1266 | 4.2% | |
| 2 | 1037 | 3.4% | |
| 3 | 720 | 2.4% | |
| 4 | 520 | 1.7% | |
| 6 | 469 | 1.6% | |
| 8 | 467 | 1.6% | |
| 7 | 426 | 1.4% | |
| 9 | 225 | 0.7% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| t | 1 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 30123 | > 99.9% | |
| Latin | 1 | < 0.1% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 0 | 23650 | 78.5% | |
| 5 | 1343 | 4.5% | |
| 1 | 1266 | 4.2% | |
| 2 | 1037 | 3.4% | |
| 3 | 720 | 2.4% | |
| 4 | 520 | 1.7% | |
| 6 | 469 | 1.6% | |
| 8 | 467 | 1.6% | |
| 7 | 426 | 1.4% | |
| 9 | 225 | 0.7% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| t | 1 | 100.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 30124 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 0 | 23650 | 78.5% | |
| 5 | 1343 | 4.5% | |
| 1 | 1266 | 4.2% | |
| 2 | 1037 | 3.4% | |
| 3 | 720 | 2.4% | |
| 4 | 520 | 1.7% | |
| 6 | 469 | 1.6% | |
| 8 | 467 | 1.6% | |
| 7 | 426 | 1.4% | |
| 9 | 225 | 0.7% | |
| t | 1 | < 0.1% |
| Distinct | 1175 |
|---|---|
| Distinct (%) | 24.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| Drama | 370 |
|---|---|
| Comedy | 282 |
| Drama|Romance | 164 |
| Comedy|Romance | 144 |
| Comedy|Drama | 142 |
| Other values (1170) |
| Value | Count | Frequency (%) | |
| Drama | 370 | 7.7% | |
| Comedy | 282 | 5.9% | |
| Drama|Romance | 164 | 3.4% | |
| Comedy|Romance | 144 | 3.0% | |
| Comedy|Drama | 142 | 3.0% | |
| Comedy|Drama|Romance | 109 | 2.3% | |
| Horror|Thriller | 88 | 1.8% | |
| Documentary | 68 | 1.4% | |
| Horror | 64 | 1.3% | |
| Drama|Thriller | 62 | 1.3% | |
| Drama|Comedy | 46 | 1.0% | |
| Crime|Drama|Thriller | 43 | 0.9% | |
| Action|Thriller | 40 | 0.8% | |
| Drama|History | 37 | 0.8% | |
| Action|Comedy | 36 | 0.7% | |
| Comedy|Family | 36 | 0.7% | |
| Drama|Comedy|Romance | 35 | 0.7% | |
| Crime|Drama | 33 | 0.7% | |
| Comedy|Crime | 30 | 0.6% | |
| Action|Crime|Thriller | 30 | 0.6% | |
| UNK | 28 | 0.6% | |
| Drama|Crime | 26 | 0.5% | |
| Animation|Family | 25 | 0.5% | |
| Action|Crime|Drama|Thriller | 25 | 0.5% | |
| Adventure|Action|Thriller | 24 | 0.5% | |
| Other values (1150) | 2816 | 58.6% |
Frequencies of value counts
Unique
| Unique | 739 ? |
|---|---|
| Unique (%) | 15.4% |
Histogram of lengths of the category
Length
| Max length | 64 |
|---|---|
| Median length | 18 |
| Mean length | 18.69643973 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| r | 8803 | 9.8% | |
| e | 7900 | 8.8% | |
| | | 7385 | 8.2% | |
| a | 7337 | 8.2% | |
| m | 6466 | 7.2% | |
| i | 6134 | 6.8% | |
| o | 5926 | 6.6% | |
| n | 5026 | 5.6% | |
| c | 3948 | 4.4% | |
| t | 3874 | 4.3% | |
| y | 3662 | 4.1% | |
| l | 3061 | 3.4% | |
| d | 2512 | 2.8% | |
| C | 2418 | 2.7% | |
| D | 2407 | 2.7% | |
| A | 2178 | 2.4% | |
| F | 1506 | 1.7% | |
| T | 1282 | 1.4% | |
| h | 1274 | 1.4% | |
| s | 1236 | 1.4% | |
| u | 1085 | 1.2% | |
| R | 894 | 1.0% | |
| v | 798 | 0.9% | |
| H | 716 | 0.8% | |
| 543 | 0.6% | ||
| Other values (8) | 1428 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 69076 | 76.9% | |
| Uppercase Letter | 12795 | 14.2% | |
| Math Symbol | 7385 | 8.2% | |
| Space Separator | 543 | 0.6% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| C | 2418 | 18.9% | |
| D | 2407 | 18.8% | |
| A | 2178 | 17.0% | |
| F | 1506 | 11.8% | |
| T | 1282 | 10.0% | |
| R | 894 | 7.0% | |
| H | 716 | 5.6% | |
| M | 541 | 4.2% | |
| S | 535 | 4.2% | |
| W | 226 | 1.8% | |
| U | 28 | 0.2% | |
| N | 28 | 0.2% | |
| K | 28 | 0.2% | |
| V | 8 | 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| r | 8803 | 12.7% | |
| e | 7900 | 11.4% | |
| a | 7337 | 10.6% | |
| m | 6466 | 9.4% | |
| i | 6134 | 8.9% | |
| o | 5926 | 8.6% | |
| n | 5026 | 7.3% | |
| c | 3948 | 5.7% | |
| t | 3874 | 5.6% | |
| y | 3662 | 5.3% | |
| l | 3061 | 4.4% | |
| d | 2512 | 3.6% | |
| h | 1274 | 1.8% | |
| s | 1236 | 1.8% | |
| u | 1085 | 1.6% | |
| v | 798 | 1.2% | |
| g | 34 | < 0.1% |
Most frequent Math Symbol characters
| Value | Count | Frequency (%) | |
| | | 7385 | 100.0% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 543 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 81871 | 91.2% | |
| Common | 7928 | 8.8% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| r | 8803 | 10.8% | |
| e | 7900 | 9.6% | |
| a | 7337 | 9.0% | |
| m | 6466 | 7.9% | |
| i | 6134 | 7.5% | |
| o | 5926 | 7.2% | |
| n | 5026 | 6.1% | |
| c | 3948 | 4.8% | |
| t | 3874 | 4.7% | |
| y | 3662 | 4.5% | |
| l | 3061 | 3.7% | |
| d | 2512 | 3.1% | |
| C | 2418 | 3.0% | |
| D | 2407 | 2.9% | |
| A | 2178 | 2.7% | |
| F | 1506 | 1.8% | |
| T | 1282 | 1.6% | |
| h | 1274 | 1.6% | |
| s | 1236 | 1.5% | |
| u | 1085 | 1.3% | |
| R | 894 | 1.1% | |
| v | 798 | 1.0% | |
| H | 716 | 0.9% | |
| M | 541 | 0.7% | |
| S | 535 | 0.7% | |
| Other values (6) | 352 | 0.4% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| | | 7385 | 93.2% | |
| 543 | 6.8% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 89799 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| r | 8803 | 9.8% | |
| e | 7900 | 8.8% | |
| | | 7385 | 8.2% | |
| a | 7337 | 8.2% | |
| m | 6466 | 7.2% | |
| i | 6134 | 6.8% | |
| o | 5926 | 6.6% | |
| n | 5026 | 5.6% | |
| c | 3948 | 4.4% | |
| t | 3874 | 4.3% | |
| y | 3662 | 4.1% | |
| l | 3061 | 3.4% | |
| d | 2512 | 2.8% | |
| C | 2418 | 2.7% | |
| D | 2407 | 2.7% | |
| A | 2178 | 2.4% | |
| F | 1506 | 1.7% | |
| T | 1282 | 1.4% | |
| h | 1274 | 1.4% | |
| s | 1236 | 1.4% | |
| u | 1085 | 1.2% | |
| R | 894 | 1.0% | |
| v | 798 | 0.9% | |
| H | 716 | 0.8% | |
| 543 | 0.6% | ||
| Other values (8) | 1428 | 1.6% |
| Distinct | 1692 |
|---|---|
| Distinct (%) | 35.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| UNK | |
|---|---|
| http://www.missionimpossible.com/ | 4 |
| http://www.thehungergames.movie/ | 4 |
| http://www.thehobbit.com/ | 3 |
| http://www.transformersmovie.com/ | 3 |
| Other values (1687) |
| Value | Count | Frequency (%) | |
| UNK | 3091 | 64.4% | |
| http://www.missionimpossible.com/ | 4 | 0.1% | |
| http://www.thehungergames.movie/ | 4 | 0.1% | |
| http://www.thehobbit.com/ | 3 | 0.1% | |
| http://www.transformersmovie.com/ | 3 | 0.1% | |
| http://www.kungfupanda.com/ | 3 | 0.1% | |
| http://www.ironmanmovie.com/ | 2 | < 0.1% | |
| http://www.howtotrainyourdragon.com/ | 2 | < 0.1% | |
| http://www.lordoftherings.net/ | 2 | < 0.1% | |
| http://www.indianajones.com | 2 | < 0.1% | |
| http://www.workandtheglory.com/ | 2 | < 0.1% | |
| http://www.munkyourself.com/ | 2 | < 0.1% | |
| http://disney.go.com/disneypictures/pirates/ | 2 | < 0.1% | |
| http://www.riomovies.com/ | 2 | < 0.1% | |
| http://www.theamazingspiderman.com | 2 | < 0.1% | |
| http://www.kickstarter.com/projects/1094772583/the-canyons | 1 | < 0.1% | |
| http://www.mgm.com/view/movie/232/Die-Another-Day/ | 1 | < 0.1% | |
| http://robzombie.com/movies/the-lords-of-salem/ | 1 | < 0.1% | |
| http://paulblartmallcop.com/ | 1 | < 0.1% | |
| http://www.beastlythemovie.com/ | 1 | < 0.1% | |
| http://shanghaicalling.com/ | 1 | < 0.1% | |
| http://invictusmovie.warnerbros.com | 1 | < 0.1% | |
| http://www.findnumberfour.com/ | 1 | < 0.1% | |
| http://www.starwars.com/films/star-wars-episode-ii-attack-of-the-clones | 1 | < 0.1% | |
| http://www.gonegirlmovie.com/ | 1 | < 0.1% | |
| Other values (1667) | 1667 | 34.7% |
Frequencies of value counts
Unique
| Unique | 1677 ? |
|---|---|
| Unique (%) | 34.9% |
Histogram of lengths of the category
Length
| Max length | 138 |
|---|---|
| Median length | 3 |
| Mean length | 14.91234645 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| t | 5707 | 8.0% | |
| / | 5701 | 8.0% | |
| e | 4467 | 6.2% | |
| o | 4438 | 6.2% | |
| w | 4418 | 6.2% | |
| m | 3492 | 4.9% | |
| . | 3393 | 4.7% | |
| h | 3110 | 4.3% | |
| N | 3100 | 4.3% | |
| U | 3094 | 4.3% | |
| K | 3093 | 4.3% | |
| i | 3017 | 4.2% | |
| c | 2543 | 3.6% | |
| p | 2347 | 3.3% | |
| s | 2280 | 3.2% | |
| r | 2209 | 3.1% | |
| a | 2196 | 3.1% | |
| n | 2051 | 2.9% | |
| : | 1713 | 2.4% | |
| l | 1414 | 2.0% | |
| v | 1162 | 1.6% | |
| d | 1081 | 1.5% | |
| u | 793 | 1.1% | |
| g | 685 | 1.0% | |
| f | 684 | 1.0% | |
| Other values (53) | 3436 | 4.8% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 49986 | 69.8% | |
| Other Punctuation | 10851 | 15.1% | |
| Uppercase Letter | 9518 | 13.3% | |
| Dash Punctuation | 613 | 0.9% | |
| Decimal Number | 557 | 0.8% | |
| Connector Punctuation | 68 | 0.1% | |
| Math Symbol | 22 | < 0.1% | |
| Open Punctuation | 4 | < 0.1% | |
| Close Punctuation | 4 | < 0.1% | |
| Space Separator | 1 | < 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| t | 5707 | 11.4% | |
| e | 4467 | 8.9% | |
| o | 4438 | 8.9% | |
| w | 4418 | 8.8% | |
| m | 3492 | 7.0% | |
| h | 3110 | 6.2% | |
| i | 3017 | 6.0% | |
| c | 2543 | 5.1% | |
| p | 2347 | 4.7% | |
| s | 2280 | 4.6% | |
| r | 2209 | 4.4% | |
| a | 2196 | 4.4% | |
| n | 2051 | 4.1% | |
| l | 1414 | 2.8% | |
| v | 1162 | 2.3% | |
| d | 1081 | 2.2% | |
| u | 793 | 1.6% | |
| g | 685 | 1.4% | |
| f | 684 | 1.4% | |
| y | 585 | 1.2% | |
| b | 563 | 1.1% | |
| k | 357 | 0.7% | |
| x | 191 | 0.4% | |
| j | 123 | 0.2% | |
| z | 61 | 0.1% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| / | 5701 | 52.5% | |
| . | 3393 | 31.3% | |
| : | 1713 | 15.8% | |
| # | 18 | 0.2% | |
| ? | 18 | 0.2% | |
| % | 3 | < 0.1% | |
| & | 3 | < 0.1% | |
| ! | 1 | < 0.1% | |
| , | 1 | < 0.1% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 613 | 100.0% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 2 | 104 | 18.7% | |
| 1 | 87 | 15.6% | |
| 0 | 84 | 15.1% | |
| 3 | 71 | 12.7% | |
| 9 | 44 | 7.9% | |
| 5 | 35 | 6.3% | |
| 4 | 34 | 6.1% | |
| 7 | 33 | 5.9% | |
| 6 | 33 | 5.9% | |
| 8 | 32 | 5.7% |
Most frequent Connector Punctuation characters
| Value | Count | Frequency (%) | |
| _ | 68 | 100.0% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| N | 3100 | 32.6% | |
| U | 3094 | 32.5% | |
| K | 3093 | 32.5% | |
| D | 22 | 0.2% | |
| A | 20 | 0.2% | |
| T | 20 | 0.2% | |
| M | 19 | 0.2% | |
| S | 17 | 0.2% | |
| E | 14 | 0.1% | |
| L | 13 | 0.1% | |
| G | 12 | 0.1% | |
| W | 10 | 0.1% | |
| H | 10 | 0.1% | |
| B | 10 | 0.1% | |
| F | 10 | 0.1% | |
| C | 9 | 0.1% | |
| I | 9 | 0.1% | |
| R | 8 | 0.1% | |
| O | 8 | 0.1% | |
| P | 6 | 0.1% | |
| V | 5 | 0.1% | |
| Y | 5 | 0.1% | |
| J | 2 | < 0.1% | |
| Q | 1 | < 0.1% | |
| Z | 1 | < 0.1% |
Most frequent Math Symbol characters
| Value | Count | Frequency (%) | |
| = | 22 | 100.0% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| ( | 3 | 75.0% | |
| { | 1 | 25.0% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| ) | 3 | 75.0% | |
| } | 1 | 25.0% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 1 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 59504 | 83.1% | |
| Common | 12120 | 16.9% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| t | 5707 | 9.6% | |
| e | 4467 | 7.5% | |
| o | 4438 | 7.5% | |
| w | 4418 | 7.4% | |
| m | 3492 | 5.9% | |
| h | 3110 | 5.2% | |
| N | 3100 | 5.2% | |
| U | 3094 | 5.2% | |
| K | 3093 | 5.2% | |
| i | 3017 | 5.1% | |
| c | 2543 | 4.3% | |
| p | 2347 | 3.9% | |
| s | 2280 | 3.8% | |
| r | 2209 | 3.7% | |
| a | 2196 | 3.7% | |
| n | 2051 | 3.4% | |
| l | 1414 | 2.4% | |
| v | 1162 | 2.0% | |
| d | 1081 | 1.8% | |
| u | 793 | 1.3% | |
| g | 685 | 1.2% | |
| f | 684 | 1.1% | |
| y | 585 | 1.0% | |
| b | 563 | 0.9% | |
| k | 357 | 0.6% | |
| Other values (26) | 618 | 1.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| / | 5701 | 47.0% | |
| . | 3393 | 28.0% | |
| : | 1713 | 14.1% | |
| - | 613 | 5.1% | |
| 2 | 104 | 0.9% | |
| 1 | 87 | 0.7% | |
| 0 | 84 | 0.7% | |
| 3 | 71 | 0.6% | |
| _ | 68 | 0.6% | |
| 9 | 44 | 0.4% | |
| 5 | 35 | 0.3% | |
| 4 | 34 | 0.3% | |
| 7 | 33 | 0.3% | |
| 6 | 33 | 0.3% | |
| 8 | 32 | 0.3% | |
| = | 22 | 0.2% | |
| # | 18 | 0.1% | |
| ? | 18 | 0.1% | |
| ( | 3 | < 0.1% | |
| ) | 3 | < 0.1% | |
| % | 3 | < 0.1% | |
| & | 3 | < 0.1% | |
| ! | 1 | < 0.1% | |
| 1 | < 0.1% | ||
| , | 1 | < 0.1% | |
| Other values (2) | 2 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 71624 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| t | 5707 | 8.0% | |
| / | 5701 | 8.0% | |
| e | 4467 | 6.2% | |
| o | 4438 | 6.2% | |
| w | 4418 | 6.2% | |
| m | 3492 | 4.9% | |
| . | 3393 | 4.7% | |
| h | 3110 | 4.3% | |
| N | 3100 | 4.3% | |
| U | 3094 | 4.3% | |
| K | 3093 | 4.3% | |
| i | 3017 | 4.2% | |
| c | 2543 | 3.6% | |
| p | 2347 | 3.3% | |
| s | 2280 | 3.2% | |
| r | 2209 | 3.1% | |
| a | 2196 | 3.1% | |
| n | 2051 | 2.9% | |
| : | 1713 | 2.4% | |
| l | 1414 | 2.0% | |
| v | 1162 | 1.6% | |
| d | 1081 | 1.5% | |
| u | 793 | 1.1% | |
| g | 685 | 1.0% | |
| f | 684 | 1.0% | |
| Other values (53) | 3436 | 4.8% |
| Distinct | 4803 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 57165.48428 |
|---|---|
| Minimum | 5 |
| Maximum | 459488 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 37.6 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 578.1 |
| Q1 | 9014.5 |
| median | 14629 |
| Q3 | 58610.5 |
| 95-th percentile | 285779 |
| Maximum | 459488 |
| Range | 459483 |
| Interquartile range (IQR) | 49596 |
Descriptive statistics
| Standard deviation | 88694.61403 |
|---|---|
| Coefficient of variation (CV) | 1.551541374 |
| Kurtosis | 3.346747662 |
| Mean | 57165.48428 |
| Median Absolute Deviation (MAD) | 12920 |
| Skewness | 2.072080474 |
| Sum | 274565821 |
| Variance | 7866734559 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 45054 | 1 | < 0.1% | |
| 19084 | 1 | < 0.1% | |
| 41144 | 1 | < 0.1% | |
| 13187 | 1 | < 0.1% | |
| 8849 | 1 | < 0.1% | |
| 29339 | 1 | < 0.1% | |
| 286939 | 1 | < 0.1% | |
| 673 | 1 | < 0.1% | |
| 10368 | 1 | < 0.1% | |
| 14438 | 1 | < 0.1% | |
| 68202 | 1 | < 0.1% | |
| 8869 | 1 | < 0.1% | |
| 72358 | 1 | < 0.1% | |
| 299687 | 1 | < 0.1% | |
| 681 | 1 | < 0.1% | |
| 260778 | 1 | < 0.1% | |
| 4518 | 1 | < 0.1% | |
| 109417 | 1 | < 0.1% | |
| 50942 | 1 | < 0.1% | |
| 178862 | 1 | < 0.1% | |
| 8841 | 1 | < 0.1% | |
| 1381 | 1 | < 0.1% | |
| 218 | 1 | < 0.1% | |
| 39538 | 1 | < 0.1% | |
| 4723 | 1 | < 0.1% | |
| Other values (4778) | 4778 | 99.5% |
| Value | Count | Frequency (%) | |
| 5 | 1 | < 0.1% | |
| 11 | 1 | < 0.1% | |
| 12 | 1 | < 0.1% | |
| 13 | 1 | < 0.1% | |
| 14 | 1 | < 0.1% | |
| 16 | 1 | < 0.1% | |
| 18 | 1 | < 0.1% | |
| 19 | 1 | < 0.1% | |
| 20 | 1 | < 0.1% | |
| 22 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 459488 | 1 | < 0.1% | |
| 447027 | 1 | < 0.1% | |
| 433715 | 1 | < 0.1% | |
| 426469 | 1 | < 0.1% | |
| 426067 | 1 | < 0.1% | |
| 417859 | 1 | < 0.1% | |
| 408429 | 1 | < 0.1% | |
| 407887 | 1 | < 0.1% | |
| 402515 | 1 | < 0.1% | |
| 396152 | 1 | < 0.1% |
| Distinct | 4222 |
|---|---|
| Distinct (%) | 87.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| UNK | 412 |
|---|---|
| independent film | 55 |
| woman director | 42 |
| duringcreditsstinger | 15 |
| sport | 13 |
| Other values (4217) |
| Value | Count | Frequency (%) | |
| UNK | 412 | 8.6% | |
| independent film | 55 | 1.1% | |
| woman director | 42 | 0.9% | |
| duringcreditsstinger | 15 | 0.3% | |
| sport | 13 | 0.3% | |
| independent film|woman director | 10 | 0.2% | |
| musical | 5 | 0.1% | |
| biography | 5 | 0.1% | |
| suspense | 5 | 0.1% | |
| dystopia | 3 | 0.1% | |
| holiday|christmas | 3 | 0.1% | |
| christian | 3 | 0.1% | |
| gay | 3 | 0.1% | |
| superhero | 3 | 0.1% | |
| mumblecore | 3 | 0.1% | |
| aftercreditsstinger | 3 | 0.1% | |
| tv movie | 2 | < 0.1% | |
| bank robbery | 2 | < 0.1% | |
| sport|independent film | 2 | < 0.1% | |
| blaxploitation | 2 | < 0.1% | |
| aftercreditsstinger|duringcreditsstinger | 2 | < 0.1% | |
| mutant|marvel comic|superhero|based on comic book|superhuman | 2 | < 0.1% | |
| soccer | 2 | < 0.1% | |
| baseball|sport | 2 | < 0.1% | |
| road movie | 2 | < 0.1% | |
| Other values (4197) | 4202 | 87.5% |
Frequencies of value counts
Unique
| Unique | 4192 ? |
|---|---|
| Unique (%) | 87.3% |
Histogram of lengths of the category
Length
| Max length | 1254 |
|---|---|
| Median length | 68 |
| Mean length | 83.35956694 |
| Min length | 2 |
Most occurring characters
| Value | Count | Frequency (%) | |
| e | 37856 | 9.5% | |
| | | 31803 | 7.9% | |
| i | 30017 | 7.5% | |
| a | 29179 | 7.3% | |
| r | 28253 | 7.1% | |
| n | 25130 | 6.3% | |
| o | 24680 | 6.2% | |
| t | 23991 | 6.0% | |
| s | 22454 | 5.6% | |
| 18363 | 4.6% | ||
| l | 17108 | 4.3% | |
| c | 14517 | 3.6% | |
| d | 13081 | 3.3% | |
| m | 10902 | 2.7% | |
| u | 9785 | 2.4% | |
| p | 9731 | 2.4% | |
| g | 9713 | 2.4% | |
| h | 9611 | 2.4% | |
| f | 6163 | 1.5% | |
| y | 5902 | 1.5% | |
| b | 5749 | 1.4% | |
| v | 4157 | 1.0% | |
| w | 3779 | 0.9% | |
| k | 3004 | 0.8% | |
| x | 1093 | 0.3% | |
| Other values (56) | 4355 | 1.1% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 347371 | 86.8% | |
| Math Symbol | 31804 | 7.9% | |
| Space Separator | 18389 | 4.6% | |
| Uppercase Letter | 1237 | 0.3% | |
| Decimal Number | 692 | 0.2% | |
| Dash Punctuation | 451 | 0.1% | |
| Other Punctuation | 371 | 0.1% | |
| Open Punctuation | 23 | < 0.1% | |
| Close Punctuation | 23 | < 0.1% | |
| Other Letter | 15 | < 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 37856 | 10.9% | |
| i | 30017 | 8.6% | |
| a | 29179 | 8.4% | |
| r | 28253 | 8.1% | |
| n | 25130 | 7.2% | |
| o | 24680 | 7.1% | |
| t | 23991 | 6.9% | |
| s | 22454 | 6.5% | |
| l | 17108 | 4.9% | |
| c | 14517 | 4.2% | |
| d | 13081 | 3.8% | |
| m | 10902 | 3.1% | |
| u | 9785 | 2.8% | |
| p | 9731 | 2.8% | |
| g | 9713 | 2.8% | |
| h | 9611 | 2.8% | |
| f | 6163 | 1.8% | |
| y | 5902 | 1.7% | |
| b | 5749 | 1.7% | |
| v | 4157 | 1.2% | |
| w | 3779 | 1.1% | |
| k | 3004 | 0.9% | |
| x | 1093 | 0.3% | |
| j | 743 | 0.2% | |
| z | 462 | 0.1% | |
| Other values (14) | 311 | 0.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 18363 | 99.9% | ||
| 26 | 0.1% |
Most frequent Math Symbol characters
| Value | Count | Frequency (%) | |
| | | 31803 | > 99.9% | |
| + | 1 | < 0.1% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 1 | 176 | 25.4% | |
| 9 | 145 | 21.0% | |
| 0 | 120 | 17.3% | |
| 3 | 106 | 15.3% | |
| 7 | 42 | 6.1% | |
| 6 | 32 | 4.6% | |
| 8 | 23 | 3.3% | |
| 5 | 22 | 3.2% | |
| 2 | 18 | 2.6% | |
| 4 | 8 | 1.2% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 182 | 49.1% | |
| ' | 171 | 46.1% | |
| , | 9 | 2.4% | |
| " | 4 | 1.1% | |
| / | 2 | 0.5% | |
| & | 2 | 0.5% | |
| * | 1 | 0.3% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 445 | 98.7% | |
| – | 6 | 1.3% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| ( | 23 | 100.0% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| ) | 23 | 100.0% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| U | 412 | 33.3% | |
| N | 412 | 33.3% | |
| K | 412 | 33.3% | |
| Γ | 1 | 0.1% |
Most frequent Other Letter characters
| Value | Count | Frequency (%) | |
| 妈 | 3 | 20.0% | |
| 绝 | 1 | 6.7% | |
| 地 | 1 | 6.7% | |
| 奶 | 1 | 6.7% | |
| 霸 | 1 | 6.7% | |
| 卧 | 1 | 6.7% | |
| 底 | 1 | 6.7% | |
| 肥 | 1 | 6.7% | |
| 爆 | 1 | 6.7% | |
| 任 | 1 | 6.7% | |
| 务 | 1 | 6.7% | |
| 超 | 1 | 6.7% | |
| 级 | 1 | 6.7% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 348606 | 87.1% | |
| Common | 51753 | 12.9% | |
| Han | 15 | < 0.1% | |
| Greek | 2 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 37856 | 10.9% | |
| i | 30017 | 8.6% | |
| a | 29179 | 8.4% | |
| r | 28253 | 8.1% | |
| n | 25130 | 7.2% | |
| o | 24680 | 7.1% | |
| t | 23991 | 6.9% | |
| s | 22454 | 6.4% | |
| l | 17108 | 4.9% | |
| c | 14517 | 4.2% | |
| d | 13081 | 3.8% | |
| m | 10902 | 3.1% | |
| u | 9785 | 2.8% | |
| p | 9731 | 2.8% | |
| g | 9713 | 2.8% | |
| h | 9611 | 2.8% | |
| f | 6163 | 1.8% | |
| y | 5902 | 1.7% | |
| b | 5749 | 1.6% | |
| v | 4157 | 1.2% | |
| w | 3779 | 1.1% | |
| k | 3004 | 0.9% | |
| x | 1093 | 0.3% | |
| j | 743 | 0.2% | |
| z | 462 | 0.1% | |
| Other values (16) | 1546 | 0.4% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| | | 31803 | 61.5% | |
| 18363 | 35.5% | ||
| - | 445 | 0.9% | |
| . | 182 | 0.4% | |
| 1 | 176 | 0.3% | |
| ' | 171 | 0.3% | |
| 9 | 145 | 0.3% | |
| 0 | 120 | 0.2% | |
| 3 | 106 | 0.2% | |
| 7 | 42 | 0.1% | |
| 6 | 32 | 0.1% | |
| 26 | 0.1% | ||
| ( | 23 | < 0.1% | |
| ) | 23 | < 0.1% | |
| 8 | 23 | < 0.1% | |
| 5 | 22 | < 0.1% | |
| 2 | 18 | < 0.1% | |
| , | 9 | < 0.1% | |
| 4 | 8 | < 0.1% | |
| – | 6 | < 0.1% | |
| " | 4 | < 0.1% | |
| / | 2 | < 0.1% | |
| & | 2 | < 0.1% | |
| * | 1 | < 0.1% | |
| + | 1 | < 0.1% |
Most frequent Greek characters
| Value | Count | Frequency (%) | |
| Γ | 1 | 50.0% | |
| η | 1 | 50.0% |
Most frequent Han characters
| Value | Count | Frequency (%) | |
| 妈 | 3 | 20.0% | |
| 绝 | 1 | 6.7% | |
| 地 | 1 | 6.7% | |
| 奶 | 1 | 6.7% | |
| 霸 | 1 | 6.7% | |
| 卧 | 1 | 6.7% | |
| 底 | 1 | 6.7% | |
| 肥 | 1 | 6.7% | |
| 爆 | 1 | 6.7% | |
| 任 | 1 | 6.7% | |
| 务 | 1 | 6.7% | |
| 超 | 1 | 6.7% | |
| 级 | 1 | 6.7% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 400289 | > 99.9% | |
| None | 66 | < 0.1% | |
| CJK | 15 | < 0.1% | |
| Punctuation | 6 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| e | 37856 | 9.5% | |
| | | 31803 | 7.9% | |
| i | 30017 | 7.5% | |
| a | 29179 | 7.3% | |
| r | 28253 | 7.1% | |
| n | 25130 | 6.3% | |
| o | 24680 | 6.2% | |
| t | 23991 | 6.0% | |
| s | 22454 | 5.6% | |
| 18363 | 4.6% | ||
| l | 17108 | 4.3% | |
| c | 14517 | 3.6% | |
| d | 13081 | 3.3% | |
| m | 10902 | 2.7% | |
| u | 9785 | 2.4% | |
| p | 9731 | 2.4% | |
| g | 9713 | 2.4% | |
| h | 9611 | 2.4% | |
| f | 6163 | 1.5% | |
| y | 5902 | 1.5% | |
| b | 5749 | 1.4% | |
| v | 4157 | 1.0% | |
| w | 3779 | 0.9% | |
| k | 3004 | 0.8% | |
| x | 1093 | 0.3% | |
| Other values (27) | 4268 | 1.1% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| 26 | 39.4% | ||
| é | 23 | 34.8% | |
| ö | 3 | 4.5% | |
| ä | 2 | 3.0% | |
| ß | 2 | 3.0% | |
| á | 1 | 1.5% | |
| í | 1 | 1.5% | |
| ç | 1 | 1.5% | |
| ó | 1 | 1.5% | |
| Γ | 1 | 1.5% | |
| η | 1 | 1.5% | |
| ú | 1 | 1.5% | |
| ü | 1 | 1.5% | |
| ű | 1 | 1.5% | |
| ô | 1 | 1.5% |
Most frequent Punctuation characters
| Value | Count | Frequency (%) | |
| – | 6 | 100.0% |
Most frequent CJK characters
| Value | Count | Frequency (%) | |
| 妈 | 3 | 20.0% | |
| 绝 | 1 | 6.7% | |
| 地 | 1 | 6.7% | |
| 奶 | 1 | 6.7% | |
| 霸 | 1 | 6.7% | |
| 卧 | 1 | 6.7% | |
| 底 | 1 | 6.7% | |
| 肥 | 1 | 6.7% | |
| 爆 | 1 | 6.7% | |
| 任 | 1 | 6.7% | |
| 务 | 1 | 6.7% | |
| 超 | 1 | 6.7% | |
| 级 | 1 | 6.7% |
language
Categorical
| Distinct | 47 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| English | |
|---|---|
| Français | 108 |
| UNK | 98 |
| Español | 84 |
| Deutsch | 61 |
| Other values (42) | 350 |
| Value | Count | Frequency (%) | |
| English | 4102 | 85.4% | |
| Français | 108 | 2.2% | |
| UNK | 98 | 2.0% | |
| Español | 84 | 1.7% | |
| Deutsch | 61 | 1.3% | |
| العربية | 33 | 0.7% | |
| 普通话 | 32 | 0.7% | |
| Italiano | 32 | 0.7% | |
| Pусский | 31 | 0.6% | |
| Český | 30 | 0.6% | |
| 广州话 / 廣州話 | 28 | 0.6% | |
| 日本語 | 23 | 0.5% | |
| हिन्दी | 22 | 0.5% | |
| Português | 17 | 0.4% | |
| Dansk | 12 | 0.2% | |
| Latin | 8 | 0.2% | |
| 한국어/조선말 | 8 | 0.2% | |
| Nederlands | 6 | 0.1% | |
| עִבְרִית | 6 | 0.1% | |
| Afrikaans | 5 | 0.1% | |
| svenska | 5 | 0.1% | |
| ελληνικά | 5 | 0.1% | |
| Norsk | 4 | 0.1% | |
| Magyar | 4 | 0.1% | |
| Română | 4 | 0.1% | |
| Other values (22) | 35 | 0.7% |
Frequencies of value counts
Unique
| Unique | 14 ? |
|---|---|
| Unique (%) | 0.3% |
Histogram of lengths of the category
Length
| Max length | 16 |
|---|---|
| Median length | 7 |
| Mean length | 6.904851135 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| s | 4456 | 13.4% | |
| n | 4291 | 12.9% | |
| i | 4279 | 12.9% | |
| l | 4233 | 12.8% | |
| E | 4188 | 12.6% | |
| h | 4168 | 12.6% | |
| g | 4131 | 12.5% | |
| a | 430 | 1.3% | |
| o | 150 | 0.5% | |
| r | 147 | 0.4% | |
| t | 123 | 0.4% | |
| e | 117 | 0.4% | |
| N | 110 | 0.3% | |
| F | 108 | 0.3% | |
| ç | 108 | 0.3% | |
| K | 101 | 0.3% | |
| U | 98 | 0.3% | |
| u | 97 | 0.3% | |
| p | 87 | 0.3% | |
| ñ | 84 | 0.3% | |
| D | 73 | 0.2% | |
| с | 65 | 0.2% | |
| k | 63 | 0.2% | |
| 62 | 0.2% | ||
| c | 61 | 0.2% | |
| Other values (108) | 1334 | 4.0% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 27379 | 82.6% | |
| Uppercase Letter | 4827 | 14.6% | |
| Other Letter | 763 | 2.3% | |
| Space Separator | 62 | 0.2% | |
| Spacing Mark | 49 | 0.1% | |
| Other Punctuation | 42 | 0.1% | |
| Nonspacing Mark | 42 | 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| E | 4188 | 86.8% | |
| N | 110 | 2.3% | |
| F | 108 | 2.2% | |
| K | 101 | 2.1% | |
| U | 98 | 2.0% | |
| D | 73 | 1.5% | |
| P | 51 | 1.1% | |
| I | 32 | 0.7% | |
| Č | 30 | 0.6% | |
| L | 10 | 0.2% | |
| A | 5 | 0.1% | |
| R | 4 | 0.1% | |
| M | 4 | 0.1% | |
| У | 2 | < 0.1% | |
| T | 2 | < 0.1% | |
| V | 2 | < 0.1% | |
| G | 2 | < 0.1% | |
| B | 2 | < 0.1% | |
| Í | 1 | < 0.1% | |
| H | 1 | < 0.1% | |
| S | 1 | < 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| s | 4456 | 16.3% | |
| n | 4291 | 15.7% | |
| i | 4279 | 15.6% | |
| l | 4233 | 15.5% | |
| h | 4168 | 15.2% | |
| g | 4131 | 15.1% | |
| a | 430 | 1.6% | |
| o | 150 | 0.5% | |
| r | 147 | 0.5% | |
| t | 123 | 0.4% | |
| e | 117 | 0.4% | |
| ç | 108 | 0.4% | |
| u | 97 | 0.4% | |
| p | 87 | 0.3% | |
| ñ | 84 | 0.3% | |
| с | 65 | 0.2% | |
| k | 63 | 0.2% | |
| c | 61 | 0.2% | |
| к | 37 | 0.1% | |
| и | 35 | 0.1% | |
| й | 33 | 0.1% | |
| у | 31 | 0.1% | |
| ý | 30 | 0.1% | |
| ê | 17 | 0.1% | |
| d | 13 | < 0.1% | |
| Other values (28) | 93 | 0.3% |
Most frequent Other Letter characters
| Value | Count | Frequency (%) | |
| 话 | 60 | 7.9% | |
| 州 | 56 | 7.3% | |
| ا | 37 | 4.8% | |
| ر | 37 | 4.8% | |
| ل | 33 | 4.3% | |
| ع | 33 | 4.3% | |
| ب | 33 | 4.3% | |
| ي | 33 | 4.3% | |
| ة | 33 | 4.3% | |
| 普 | 32 | 4.2% | |
| 通 | 32 | 4.2% | |
| 广 | 28 | 3.7% | |
| 廣 | 28 | 3.7% | |
| 話 | 28 | 3.7% | |
| 日 | 23 | 3.0% | |
| 本 | 23 | 3.0% | |
| 語 | 23 | 3.0% | |
| ह | 22 | 2.9% | |
| न | 22 | 2.9% | |
| द | 22 | 2.9% | |
| 한 | 8 | 1.0% | |
| 국 | 8 | 1.0% | |
| 어 | 8 | 1.0% | |
| 조 | 8 | 1.0% | |
| 선 | 8 | 1.0% | |
| Other values (22) | 85 | 11.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 62 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| / | 36 | 85.7% | |
| ? | 6 | 14.3% |
Most frequent Nonspacing Mark characters
| Value | Count | Frequency (%) | |
| ् | 22 | 52.4% | |
| ִ | 12 | 28.6% | |
| ְ | 6 | 14.3% | |
| ் | 2 | 4.8% |
Most frequent Spacing Mark characters
| Value | Count | Frequency (%) | |
| ि | 22 | 44.9% | |
| ी | 22 | 44.9% | |
| ி | 2 | 4.1% | |
| া | 2 | 4.1% | |
| ং | 1 | 2.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 31945 | 96.3% | |
| Han | 333 | 1.0% | |
| Arabic | 250 | 0.8% | |
| Cyrillic | 221 | 0.7% | |
| Devanagari | 132 | 0.4% | |
| Common | 104 | 0.3% | |
| Hebrew | 48 | 0.1% | |
| Hangul | 48 | 0.1% | |
| Greek | 40 | 0.1% | |
| Thai | 28 | 0.1% | |
| Tamil | 10 | < 0.1% | |
| Bengali | 5 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| s | 4456 | 13.9% | |
| n | 4291 | 13.4% | |
| i | 4279 | 13.4% | |
| l | 4233 | 13.3% | |
| E | 4188 | 13.1% | |
| h | 4168 | 13.0% | |
| g | 4131 | 12.9% | |
| a | 430 | 1.3% | |
| o | 150 | 0.5% | |
| r | 147 | 0.5% | |
| t | 123 | 0.4% | |
| e | 117 | 0.4% | |
| N | 110 | 0.3% | |
| F | 108 | 0.3% | |
| ç | 108 | 0.3% | |
| K | 101 | 0.3% | |
| U | 98 | 0.3% | |
| u | 97 | 0.3% | |
| p | 87 | 0.3% | |
| ñ | 84 | 0.3% | |
| D | 73 | 0.2% | |
| k | 63 | 0.2% | |
| c | 61 | 0.2% | |
| P | 51 | 0.2% | |
| I | 32 | 0.1% | |
| Other values (25) | 159 | 0.5% |
Most frequent Han characters
| Value | Count | Frequency (%) | |
| 话 | 60 | 18.0% | |
| 州 | 56 | 16.8% | |
| 普 | 32 | 9.6% | |
| 通 | 32 | 9.6% | |
| 广 | 28 | 8.4% | |
| 廣 | 28 | 8.4% | |
| 話 | 28 | 8.4% | |
| 日 | 23 | 6.9% | |
| 本 | 23 | 6.9% | |
| 語 | 23 | 6.9% |
Most frequent Arabic characters
| Value | Count | Frequency (%) | |
| ا | 37 | 14.8% | |
| ر | 37 | 14.8% | |
| ل | 33 | 13.2% | |
| ع | 33 | 13.2% | |
| ب | 33 | 13.2% | |
| ي | 33 | 13.2% | |
| ة | 33 | 13.2% | |
| ف | 3 | 1.2% | |
| س | 3 | 1.2% | |
| ی | 3 | 1.2% | |
| د | 1 | 0.4% | |
| و | 1 | 0.4% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 62 | 59.6% | ||
| / | 36 | 34.6% | |
| ? | 6 | 5.8% |
Most frequent Cyrillic characters
| Value | Count | Frequency (%) | |
| с | 65 | 29.4% | |
| к | 37 | 16.7% | |
| и | 35 | 15.8% | |
| й | 33 | 14.9% | |
| у | 31 | 14.0% | |
| а | 3 | 1.4% | |
| р | 3 | 1.4% | |
| У | 2 | 0.9% | |
| ї | 2 | 0.9% | |
| н | 2 | 0.9% | |
| ь | 2 | 0.9% | |
| б | 1 | 0.5% | |
| ъ | 1 | 0.5% | |
| л | 1 | 0.5% | |
| г | 1 | 0.5% | |
| е | 1 | 0.5% | |
| з | 1 | 0.5% |
Most frequent Greek characters
| Value | Count | Frequency (%) | |
| λ | 10 | 25.0% | |
| ε | 5 | 12.5% | |
| η | 5 | 12.5% | |
| ν | 5 | 12.5% | |
| ι | 5 | 12.5% | |
| κ | 5 | 12.5% | |
| ά | 5 | 12.5% |
Most frequent Hebrew characters
| Value | Count | Frequency (%) | |
| ִ | 12 | 25.0% | |
| ע | 6 | 12.5% | |
| ב | 6 | 12.5% | |
| ְ | 6 | 12.5% | |
| ר | 6 | 12.5% | |
| י | 6 | 12.5% | |
| ת | 6 | 12.5% |
Most frequent Devanagari characters
| Value | Count | Frequency (%) | |
| ह | 22 | 16.7% | |
| ि | 22 | 16.7% | |
| न | 22 | 16.7% | |
| ् | 22 | 16.7% | |
| द | 22 | 16.7% | |
| ी | 22 | 16.7% |
Most frequent Hangul characters
| Value | Count | Frequency (%) | |
| 한 | 8 | 16.7% | |
| 국 | 8 | 16.7% | |
| 어 | 8 | 16.7% | |
| 조 | 8 | 16.7% | |
| 선 | 8 | 16.7% | |
| 말 | 8 | 16.7% |
Most frequent Thai characters
| Value | Count | Frequency (%) | |
| า | 8 | 28.6% | |
| ภ | 4 | 14.3% | |
| ษ | 4 | 14.3% | |
| ไ | 4 | 14.3% | |
| ท | 4 | 14.3% | |
| ย | 4 | 14.3% |
Most frequent Tamil characters
| Value | Count | Frequency (%) | |
| த | 2 | 20.0% | |
| ம | 2 | 20.0% | |
| ி | 2 | 20.0% | |
| ழ | 2 | 20.0% | |
| ் | 2 | 20.0% |
Most frequent Bengali characters
| Value | Count | Frequency (%) | |
| া | 2 | 40.0% | |
| ব | 1 | 20.0% | |
| ং | 1 | 20.0% | |
| ল | 1 | 20.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 31767 | 95.8% | |
| CJK | 333 | 1.0% | |
| None | 318 | 1.0% | |
| Arabic | 250 | 0.8% | |
| Cyrillic | 221 | 0.7% | |
| Devanagari | 132 | 0.4% | |
| Hebrew | 48 | 0.1% | |
| Hangul | 48 | 0.1% | |
| Thai | 28 | 0.1% | |
| Tamil | 10 | < 0.1% | |
| Bengali | 5 | < 0.1% | |
| Latin Ext Additional | 4 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| s | 4456 | 14.0% | |
| n | 4291 | 13.5% | |
| i | 4279 | 13.5% | |
| l | 4233 | 13.3% | |
| E | 4188 | 13.2% | |
| h | 4168 | 13.1% | |
| g | 4131 | 13.0% | |
| a | 430 | 1.4% | |
| o | 150 | 0.5% | |
| r | 147 | 0.5% | |
| t | 123 | 0.4% | |
| e | 117 | 0.4% | |
| N | 110 | 0.3% | |
| F | 108 | 0.3% | |
| K | 101 | 0.3% | |
| U | 98 | 0.3% | |
| u | 97 | 0.3% | |
| p | 87 | 0.3% | |
| D | 73 | 0.2% | |
| k | 63 | 0.2% | |
| 62 | 0.2% | ||
| c | 61 | 0.2% | |
| P | 51 | 0.2% | |
| / | 36 | 0.1% | |
| I | 32 | 0.1% | |
| Other values (18) | 75 | 0.2% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| ç | 108 | 34.0% | |
| ñ | 84 | 26.4% | |
| Č | 30 | 9.4% | |
| ý | 30 | 9.4% | |
| ê | 17 | 5.3% | |
| λ | 10 | 3.1% | |
| ε | 5 | 1.6% | |
| η | 5 | 1.6% | |
| ν | 5 | 1.6% | |
| ι | 5 | 1.6% | |
| κ | 5 | 1.6% | |
| ά | 5 | 1.6% | |
| â | 4 | 1.3% | |
| ă | 4 | 1.3% | |
| Í | 1 | 0.3% |
Most frequent CJK characters
| Value | Count | Frequency (%) | |
| 话 | 60 | 18.0% | |
| 州 | 56 | 16.8% | |
| 普 | 32 | 9.6% | |
| 通 | 32 | 9.6% | |
| 广 | 28 | 8.4% | |
| 廣 | 28 | 8.4% | |
| 話 | 28 | 8.4% | |
| 日 | 23 | 6.9% | |
| 本 | 23 | 6.9% | |
| 語 | 23 | 6.9% |
Most frequent Arabic characters
| Value | Count | Frequency (%) | |
| ا | 37 | 14.8% | |
| ر | 37 | 14.8% | |
| ل | 33 | 13.2% | |
| ع | 33 | 13.2% | |
| ب | 33 | 13.2% | |
| ي | 33 | 13.2% | |
| ة | 33 | 13.2% | |
| ف | 3 | 1.2% | |
| س | 3 | 1.2% | |
| ی | 3 | 1.2% | |
| د | 1 | 0.4% | |
| و | 1 | 0.4% |
Most frequent Cyrillic characters
| Value | Count | Frequency (%) | |
| с | 65 | 29.4% | |
| к | 37 | 16.7% | |
| и | 35 | 15.8% | |
| й | 33 | 14.9% | |
| у | 31 | 14.0% | |
| а | 3 | 1.4% | |
| р | 3 | 1.4% | |
| У | 2 | 0.9% | |
| ї | 2 | 0.9% | |
| н | 2 | 0.9% | |
| ь | 2 | 0.9% | |
| б | 1 | 0.5% | |
| ъ | 1 | 0.5% | |
| л | 1 | 0.5% | |
| г | 1 | 0.5% | |
| е | 1 | 0.5% | |
| з | 1 | 0.5% |
Most frequent Hebrew characters
| Value | Count | Frequency (%) | |
| ִ | 12 | 25.0% | |
| ע | 6 | 12.5% | |
| ב | 6 | 12.5% | |
| ְ | 6 | 12.5% | |
| ר | 6 | 12.5% | |
| י | 6 | 12.5% | |
| ת | 6 | 12.5% |
Most frequent Devanagari characters
| Value | Count | Frequency (%) | |
| ह | 22 | 16.7% | |
| ि | 22 | 16.7% | |
| न | 22 | 16.7% | |
| ् | 22 | 16.7% | |
| द | 22 | 16.7% | |
| ी | 22 | 16.7% |
Most frequent Hangul characters
| Value | Count | Frequency (%) | |
| 한 | 8 | 16.7% | |
| 국 | 8 | 16.7% | |
| 어 | 8 | 16.7% | |
| 조 | 8 | 16.7% | |
| 선 | 8 | 16.7% | |
| 말 | 8 | 16.7% |
Most frequent Thai characters
| Value | Count | Frequency (%) | |
| า | 8 | 28.6% | |
| ภ | 4 | 14.3% | |
| ษ | 4 | 14.3% | |
| ไ | 4 | 14.3% | |
| ท | 4 | 14.3% | |
| ย | 4 | 14.3% |
Most frequent Latin Ext Additional characters
| Value | Count | Frequency (%) | |
| ế | 2 | 50.0% | |
| ệ | 2 | 50.0% |
Most frequent Tamil characters
| Value | Count | Frequency (%) | |
| த | 2 | 20.0% | |
| ம | 2 | 20.0% | |
| ி | 2 | 20.0% | |
| ழ | 2 | 20.0% | |
| ் | 2 | 20.0% |
Most frequent Bengali characters
| Value | Count | Frequency (%) | |
| া | 2 | 40.0% | |
| ব | 1 | 20.0% | |
| ং | 1 | 20.0% | |
| ল | 1 | 20.0% |
| Distinct | 4801 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| Out of the Blue | 2 |
|---|---|
| Batman | 2 |
| Slither | 1 |
| The Benchwarmers | 1 |
| Knocked Up | 1 |
| Other values (4796) |
| Value | Count | Frequency (%) | |
| Out of the Blue | 2 | < 0.1% | |
| Batman | 2 | < 0.1% | |
| Slither | 1 | < 0.1% | |
| The Benchwarmers | 1 | < 0.1% | |
| Knocked Up | 1 | < 0.1% | |
| Fever Pitch | 1 | < 0.1% | |
| Miracle at St. Anna | 1 | < 0.1% | |
| Bandidas | 1 | < 0.1% | |
| R100 | 1 | < 0.1% | |
| Appaloosa | 1 | < 0.1% | |
| キャプテンハーロック | 1 | < 0.1% | |
| The Railway Man | 1 | < 0.1% | |
| Boyhood | 1 | < 0.1% | |
| Killer Elite | 1 | < 0.1% | |
| Kansas City | 1 | < 0.1% | |
| The Grudge | 1 | < 0.1% | |
| The Yellow Handkerchief | 1 | < 0.1% | |
| Truth or Dare | 1 | < 0.1% | |
| Wimbledon | 1 | < 0.1% | |
| Love & Basketball | 1 | < 0.1% | |
| The Bounty Hunter | 1 | < 0.1% | |
| Kung Pow: Enter the Fist | 1 | < 0.1% | |
| Wonderland | 1 | < 0.1% | |
| Bully | 1 | < 0.1% | |
| Return to the Blue Lagoon | 1 | < 0.1% | |
| Other values (4776) | 4776 | 99.4% |
Frequencies of value counts
Unique
| Unique | 4799 ? |
|---|---|
| Unique (%) | 99.9% |
Histogram of lengths of the category
Length
| Max length | 86 |
|---|---|
| Median length | 14 |
| Mean length | 15.22298563 |
| Min length | 1 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 8436 | 11.5% | ||
| e | 7412 | 10.1% | |
| a | 4625 | 6.3% | |
| o | 4370 | 6.0% | |
| r | 3892 | 5.3% | |
| n | 3885 | 5.3% | |
| i | 3723 | 5.1% | |
| t | 3600 | 4.9% | |
| s | 2857 | 3.9% | |
| h | 2717 | 3.7% | |
| l | 2426 | 3.3% | |
| d | 1760 | 2.4% | |
| T | 1562 | 2.1% | |
| u | 1525 | 2.1% | |
| c | 1179 | 1.6% | |
| g | 1137 | 1.6% | |
| y | 1074 | 1.5% | |
| m | 1066 | 1.5% | |
| S | 972 | 1.3% | |
| f | 815 | 1.1% | |
| M | 777 | 1.1% | |
| B | 733 | 1.0% | |
| p | 695 | 1.0% | |
| D | 680 | 0.9% | |
| C | 638 | 0.9% | |
| Other values (385) | 10560 | 14.4% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 51401 | 70.3% | |
| Uppercase Letter | 11413 | 15.6% | |
| Space Separator | 8436 | 11.5% | |
| Other Punctuation | 906 | 1.2% | |
| Decimal Number | 494 | 0.7% | |
| Other Letter | 328 | 0.4% | |
| Dash Punctuation | 84 | 0.1% | |
| Spacing Mark | 16 | < 0.1% | |
| Nonspacing Mark | 8 | < 0.1% | |
| Open Punctuation | 6 | < 0.1% | |
| Close Punctuation | 6 | < 0.1% | |
| Currency Symbol | 5 | < 0.1% | |
| Other Number | 4 | < 0.1% | |
| Final Punctuation | 3 | < 0.1% | |
| Math Symbol | 2 | < 0.1% | |
| Modifier Letter | 1 | < 0.1% | |
| Connector Punctuation | 1 | < 0.1% | |
| Format | 1 | < 0.1% | |
| Other Symbol | 1 | < 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| T | 1562 | 13.7% | |
| S | 972 | 8.5% | |
| M | 777 | 6.8% | |
| B | 733 | 6.4% | |
| D | 680 | 6.0% | |
| C | 638 | 5.6% | |
| A | 618 | 5.4% | |
| L | 565 | 5.0% | |
| H | 526 | 4.6% | |
| P | 470 | 4.1% | |
| W | 469 | 4.1% | |
| R | 454 | 4.0% | |
| G | 447 | 3.9% | |
| I | 443 | 3.9% | |
| F | 435 | 3.8% | |
| E | 302 | 2.6% | |
| N | 262 | 2.3% | |
| O | 212 | 1.9% | |
| J | 191 | 1.7% | |
| K | 176 | 1.5% | |
| V | 142 | 1.2% | |
| Y | 125 | 1.1% | |
| U | 111 | 1.0% | |
| Z | 42 | 0.4% | |
| Q | 25 | 0.2% | |
| Other values (12) | 36 | 0.3% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 7412 | 14.4% | |
| a | 4625 | 9.0% | |
| o | 4370 | 8.5% | |
| r | 3892 | 7.6% | |
| n | 3885 | 7.6% | |
| i | 3723 | 7.2% | |
| t | 3600 | 7.0% | |
| s | 2857 | 5.6% | |
| h | 2717 | 5.3% | |
| l | 2426 | 4.7% | |
| d | 1760 | 3.4% | |
| u | 1525 | 3.0% | |
| c | 1179 | 2.3% | |
| g | 1137 | 2.2% | |
| y | 1074 | 2.1% | |
| m | 1066 | 2.1% | |
| f | 815 | 1.6% | |
| p | 695 | 1.4% | |
| k | 635 | 1.2% | |
| v | 577 | 1.1% | |
| w | 472 | 0.9% | |
| b | 463 | 0.9% | |
| x | 141 | 0.3% | |
| z | 94 | 0.2% | |
| j | 54 | 0.1% | |
| Other values (55) | 207 | 0.4% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 8436 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| : | 341 | 37.6% | |
| ' | 222 | 24.5% | |
| . | 141 | 15.6% | |
| , | 75 | 8.3% | |
| & | 60 | 6.6% | |
| ! | 35 | 3.9% | |
| ? | 18 | 2.0% | |
| / | 7 | 0.8% | |
| # | 2 | 0.2% | |
| * | 2 | 0.2% | |
| · | 1 | 0.1% | |
| ・ | 1 | 0.1% | |
| ‧ | 1 | 0.1% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 82 | 97.6% | |
| – | 2 | 2.4% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 2 | 144 | 29.1% | |
| 1 | 78 | 15.8% | |
| 3 | 74 | 15.0% | |
| 0 | 74 | 15.0% | |
| 4 | 35 | 7.1% | |
| 8 | 21 | 4.3% | |
| 5 | 21 | 4.3% | |
| 9 | 17 | 3.4% | |
| 7 | 15 | 3.0% | |
| 6 | 15 | 3.0% |
Most frequent Other Letter characters
| Value | Count | Frequency (%) | |
| の | 7 | 2.1% | |
| ا | 7 | 2.1% | |
| ی | 6 | 1.8% | |
| 三 | 5 | 1.5% | |
| 城 | 4 | 1.2% | |
| 的 | 4 | 1.2% | |
| क | 4 | 1.2% | |
| ن | 4 | 1.2% | |
| 十 | 3 | 0.9% | |
| 一 | 3 | 0.9% | |
| キ | 3 | 0.9% | |
| 人 | 3 | 0.9% | |
| 놈 | 3 | 0.9% | |
| س | 3 | 0.9% | |
| ر | 3 | 0.9% | |
| د | 3 | 0.9% | |
| ه | 3 | 0.9% | |
| ン | 2 | 0.6% | |
| ラ | 2 | 0.6% | |
| 金 | 2 | 0.6% | |
| 天 | 2 | 0.6% | |
| 雄 | 2 | 0.6% | |
| 師 | 2 | 0.6% | |
| 记 | 2 | 0.6% | |
| 之 | 2 | 0.6% | |
| Other values (214) | 244 | 74.4% |
Most frequent Other Number characters
| Value | Count | Frequency (%) | |
| ³ | 1 | 25.0% | |
| ⅓ | 1 | 25.0% | |
| ½ | 1 | 25.0% | |
| ² | 1 | 25.0% |
Most frequent Currency Symbol characters
| Value | Count | Frequency (%) | |
| $ | 3 | 60.0% | |
| ¢ | 2 | 40.0% |
Most frequent Modifier Letter characters
| Value | Count | Frequency (%) | |
| ー | 1 | 100.0% |
Most frequent Spacing Mark characters
| Value | Count | Frequency (%) | |
| ी | 5 | 31.2% | |
| ா | 3 | 18.8% | |
| ा | 3 | 18.8% | |
| ு | 2 | 12.5% | |
| ि | 2 | 12.5% | |
| ो | 1 | 6.2% |
Most frequent Nonspacing Mark characters
| Value | Count | Frequency (%) | |
| ุ | 2 | 25.0% | |
| ้ | 2 | 25.0% | |
| ் | 1 | 12.5% | |
| ृ | 1 | 12.5% | |
| ิ | 1 | 12.5% | |
| े | 1 | 12.5% |
Most frequent Math Symbol characters
| Value | Count | Frequency (%) | |
| + | 2 | 100.0% |
Most frequent Final Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 3 | 100.0% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| ( | 4 | 66.7% | |
| [ | 2 | 33.3% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| ) | 4 | 66.7% | |
| ] | 2 | 33.3% |
Most frequent Connector Punctuation characters
| Value | Count | Frequency (%) | |
| _ | 1 | 100.0% |
Most frequent Format characters
| Value | Count | Frequency (%) | |
| | 1 | 100.0% |
Most frequent Other Symbol characters
| Value | Count | Frequency (%) | |
| ° | 1 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 62677 | 85.7% | |
| Common | 9949 | 13.6% | |
| Han | 160 | 0.2% | |
| Cyrillic | 127 | 0.2% | |
| Hangul | 45 | 0.1% | |
| Arabic | 39 | 0.1% | |
| Devanagari | 35 | < 0.1% | |
| Katakana | 26 | < 0.1% | |
| Hiragana | 17 | < 0.1% | |
| Thai | 17 | < 0.1% | |
| Tamil | 13 | < 0.1% | |
| Greek | 10 | < 0.1% | |
| Inherited | 1 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 7412 | 11.8% | |
| a | 4625 | 7.4% | |
| o | 4370 | 7.0% | |
| r | 3892 | 6.2% | |
| n | 3885 | 6.2% | |
| i | 3723 | 5.9% | |
| t | 3600 | 5.7% | |
| s | 2857 | 4.6% | |
| h | 2717 | 4.3% | |
| l | 2426 | 3.9% | |
| d | 1760 | 2.8% | |
| T | 1562 | 2.5% | |
| u | 1525 | 2.4% | |
| c | 1179 | 1.9% | |
| g | 1137 | 1.8% | |
| y | 1074 | 1.7% | |
| m | 1066 | 1.7% | |
| S | 972 | 1.6% | |
| f | 815 | 1.3% | |
| M | 777 | 1.2% | |
| B | 733 | 1.2% | |
| p | 695 | 1.1% | |
| D | 680 | 1.1% | |
| C | 638 | 1.0% | |
| k | 635 | 1.0% | |
| Other values (47) | 7922 | 12.6% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 8436 | 84.8% | ||
| : | 341 | 3.4% | |
| ' | 222 | 2.2% | |
| 2 | 144 | 1.4% | |
| . | 141 | 1.4% | |
| - | 82 | 0.8% | |
| 1 | 78 | 0.8% | |
| , | 75 | 0.8% | |
| 3 | 74 | 0.7% | |
| 0 | 74 | 0.7% | |
| & | 60 | 0.6% | |
| 4 | 35 | 0.4% | |
| ! | 35 | 0.4% | |
| 8 | 21 | 0.2% | |
| 5 | 21 | 0.2% | |
| ? | 18 | 0.2% | |
| 9 | 17 | 0.2% | |
| 7 | 15 | 0.2% | |
| 6 | 15 | 0.2% | |
| / | 7 | 0.1% | |
| ( | 4 | < 0.1% | |
| ) | 4 | < 0.1% | |
| ’ | 3 | < 0.1% | |
| $ | 3 | < 0.1% | |
| ¢ | 2 | < 0.1% | |
| Other values (16) | 22 | 0.2% |
Most frequent Katakana characters
| Value | Count | Frequency (%) | |
| キ | 3 | 11.5% | |
| ン | 2 | 7.7% | |
| ラ | 2 | 7.7% | |
| ハ | 2 | 7.7% | |
| ア | 2 | 7.7% | |
| シ | 1 | 3.8% | |
| ゴ | 1 | 3.8% | |
| ジ | 1 | 3.8% | |
| ポ | 1 | 3.8% | |
| ニ | 1 | 3.8% | |
| ョ | 1 | 3.8% | |
| ャ | 1 | 3.8% | |
| プ | 1 | 3.8% | |
| テ | 1 | 3.8% | |
| ロ | 1 | 3.8% | |
| ッ | 1 | 3.8% | |
| ク | 1 | 3.8% | |
| ウ | 1 | 3.8% | |
| ル | 1 | 3.8% | |
| ュ | 1 | 3.8% |
Most frequent Han characters
| Value | Count | Frequency (%) | |
| 三 | 5 | 3.1% | |
| 城 | 4 | 2.5% | |
| 的 | 4 | 2.5% | |
| 十 | 3 | 1.9% | |
| 一 | 3 | 1.9% | |
| 人 | 3 | 1.9% | |
| 金 | 2 | 1.2% | |
| 天 | 2 | 1.2% | |
| 雄 | 2 | 1.2% | |
| 師 | 2 | 1.2% | |
| 记 | 2 | 1.2% | |
| 之 | 2 | 1.2% | |
| 黃 | 2 | 1.2% | |
| 甲 | 2 | 1.2% | |
| 個 | 2 | 1.2% | |
| 林 | 2 | 1.2% | |
| 石 | 2 | 1.2% | |
| 七 | 2 | 1.2% | |
| 龙 | 2 | 1.2% | |
| 千 | 2 | 1.2% | |
| 風 | 2 | 1.2% | |
| 暴 | 2 | 1.2% | |
| 奇 | 2 | 1.2% | |
| 南 | 2 | 1.2% | |
| 京 | 2 | 1.2% | |
| Other values (98) | 100 | 62.5% |
Most frequent Cyrillic characters
| Value | Count | Frequency (%) | |
| о | 18 | 14.2% | |
| а | 10 | 7.9% | |
| е | 10 | 7.9% | |
| р | 9 | 7.1% | |
| н | 8 | 6.3% | |
| л | 7 | 5.5% | |
| и | 6 | 4.7% | |
| в | 6 | 4.7% | |
| С | 5 | 3.9% | |
| д | 4 | 3.1% | |
| к | 4 | 3.1% | |
| б | 4 | 3.1% | |
| г | 4 | 3.1% | |
| с | 3 | 2.4% | |
| з | 3 | 2.4% | |
| я | 2 | 1.6% | |
| т | 2 | 1.6% | |
| ы | 2 | 1.6% | |
| у | 2 | 1.6% | |
| п | 2 | 1.6% | |
| ц | 1 | 0.8% | |
| Б | 1 | 0.8% | |
| З | 1 | 0.8% | |
| ё | 1 | 0.8% | |
| М | 1 | 0.8% | |
| Other values (11) | 11 | 8.7% |
Most frequent Hiragana characters
| Value | Count | Frequency (%) | |
| の | 7 | 41.2% | |
| だ | 2 | 11.8% | |
| く | 1 | 5.9% | |
| も | 1 | 5.9% | |
| け | 1 | 5.9% | |
| と | 1 | 5.9% | |
| し | 1 | 5.9% | |
| ま | 1 | 5.9% | |
| あ | 1 | 5.9% | |
| よ | 1 | 5.9% |
Most frequent Hangul characters
| Value | Count | Frequency (%) | |
| 놈 | 3 | 6.7% | |
| 상 | 2 | 4.4% | |
| 이 | 2 | 4.4% | |
| 한 | 2 | 4.4% | |
| 디 | 1 | 2.2% | |
| 워 | 1 | 2.2% | |
| 해 | 1 | 2.2% | |
| 운 | 1 | 2.2% | |
| 대 | 1 | 2.2% | |
| 인 | 1 | 2.2% | |
| 천 | 1 | 2.2% | |
| 륙 | 1 | 2.2% | |
| 작 | 1 | 2.2% | |
| 전 | 1 | 2.2% | |
| 태 | 1 | 2.2% | |
| 극 | 1 | 2.2% | |
| 기 | 1 | 2.2% | |
| 휘 | 1 | 2.2% | |
| 날 | 1 | 2.2% | |
| 리 | 1 | 2.2% | |
| 며 | 1 | 2.2% | |
| 괴 | 1 | 2.2% | |
| 물 | 1 | 2.2% | |
| 좋 | 1 | 2.2% | |
| 은 | 1 | 2.2% | |
| Other values (15) | 15 | 33.3% |
Most frequent Tamil characters
| Value | Count | Frequency (%) | |
| ா | 3 | 23.1% | |
| ன | 2 | 15.4% | |
| ு | 2 | 15.4% | |
| ர | 1 | 7.7% | |
| ம | 1 | 7.7% | |
| ஜ | 1 | 7.7% | |
| ் | 1 | 7.7% | |
| வ | 1 | 7.7% | |
| ல | 1 | 7.7% |
Most frequent Devanagari characters
| Value | Count | Frequency (%) | |
| ी | 5 | 14.3% | |
| क | 4 | 11.4% | |
| ा | 3 | 8.6% | |
| भ | 2 | 5.7% | |
| ल | 2 | 5.7% | |
| ि | 2 | 5.7% | |
| द | 2 | 5.7% | |
| न | 2 | 5.7% | |
| ह | 2 | 5.7% | |
| अ | 1 | 2.9% | |
| व | 1 | 2.9% | |
| ृ | 1 | 2.9% | |
| ष | 1 | 2.9% | |
| ए | 1 | 2.9% | |
| ब | 1 | 2.9% | |
| स | 1 | 2.9% | |
| ड | 1 | 2.9% | |
| ज | 1 | 2.9% | |
| ो | 1 | 2.9% | |
| े | 1 | 2.9% |
Most frequent Thai characters
| Value | Count | Frequency (%) | |
| ุ | 2 | 11.8% | |
| ย | 2 | 11.8% | |
| ้ | 2 | 11.8% | |
| ส | 1 | 5.9% | |
| ร | 1 | 5.9% | |
| ิ | 1 | 5.9% | |
| โ | 1 | 5.9% | |
| ไ | 1 | 5.9% | |
| ท | 1 | 5.9% | |
| ต | 1 | 5.9% | |
| ม | 1 | 5.9% | |
| ำ | 1 | 5.9% | |
| ก | 1 | 5.9% | |
| ง | 1 | 5.9% |
Most frequent Arabic characters
| Value | Count | Frequency (%) | |
| ا | 7 | 17.9% | |
| ی | 6 | 15.4% | |
| ن | 4 | 10.3% | |
| س | 3 | 7.7% | |
| ر | 3 | 7.7% | |
| د | 3 | 7.7% | |
| ه | 3 | 7.7% | |
| ب | 2 | 5.1% | |
| م | 2 | 5.1% | |
| ك | 1 | 2.6% | |
| ت | 1 | 2.6% | |
| ج | 1 | 2.6% | |
| ز | 1 | 2.6% | |
| چ | 1 | 2.6% | |
| آ | 1 | 2.6% |
Most frequent Greek characters
| Value | Count | Frequency (%) | |
| ν | 2 | 20.0% | |
| Κ | 1 | 10.0% | |
| υ | 1 | 10.0% | |
| ό | 1 | 10.0% | |
| δ | 1 | 10.0% | |
| ο | 1 | 10.0% | |
| τ | 1 | 10.0% | |
| α | 1 | 10.0% | |
| ς | 1 | 10.0% |
Most frequent Inherited characters
| Value | Count | Frequency (%) | |
| | 1 | 100.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 72554 | 99.2% | |
| CJK | 160 | 0.2% | |
| Cyrillic | 127 | 0.2% | |
| None | 72 | 0.1% | |
| Hangul | 45 | 0.1% | |
| Arabic | 39 | 0.1% | |
| Devanagari | 35 | < 0.1% | |
| Katakana | 28 | < 0.1% | |
| Hiragana | 17 | < 0.1% | |
| Thai | 17 | < 0.1% | |
| Tamil | 13 | < 0.1% | |
| Punctuation | 7 | < 0.1% | |
| Number Forms | 1 | < 0.1% | |
| Latin Ext Additional | 1 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 8436 | 11.6% | ||
| e | 7412 | 10.2% | |
| a | 4625 | 6.4% | |
| o | 4370 | 6.0% | |
| r | 3892 | 5.4% | |
| n | 3885 | 5.4% | |
| i | 3723 | 5.1% | |
| t | 3600 | 5.0% | |
| s | 2857 | 3.9% | |
| h | 2717 | 3.7% | |
| l | 2426 | 3.3% | |
| d | 1760 | 2.4% | |
| T | 1562 | 2.2% | |
| u | 1525 | 2.1% | |
| c | 1179 | 1.6% | |
| g | 1137 | 1.6% | |
| y | 1074 | 1.5% | |
| m | 1066 | 1.5% | |
| S | 972 | 1.3% | |
| f | 815 | 1.1% | |
| M | 777 | 1.1% | |
| B | 733 | 1.0% | |
| p | 695 | 1.0% | |
| D | 680 | 0.9% | |
| C | 638 | 0.9% | |
| Other values (56) | 9998 | 13.8% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| é | 18 | 25.0% | |
| à | 4 | 5.6% | |
| è | 4 | 5.6% | |
| ó | 4 | 5.6% | |
| á | 3 | 4.2% | |
| í | 3 | 4.2% | |
| å | 3 | 4.2% | |
| ü | 2 | 2.8% | |
| ¢ | 2 | 2.8% | |
| ñ | 2 | 2.8% | |
| ă | 2 | 2.8% | |
| ø | 2 | 2.8% | |
| ν | 2 | 2.8% | |
| · | 1 | 1.4% | |
| É | 1 | 1.4% | |
| ³ | 1 | 1.4% | |
| Æ | 1 | 1.4% | |
| ç | 1 | 1.4% | |
| ½ | 1 | 1.4% | |
| ë | 1 | 1.4% | |
| ² | 1 | 1.4% | |
| ư | 1 | 1.4% | |
| î | 1 | 1.4% | |
| ș | 1 | 1.4% | |
| ų | 1 | 1.4% | |
| Other values (9) | 9 | 12.5% |
Most frequent Katakana characters
| Value | Count | Frequency (%) | |
| キ | 3 | 10.7% | |
| ン | 2 | 7.1% | |
| ラ | 2 | 7.1% | |
| ハ | 2 | 7.1% | |
| ア | 2 | 7.1% | |
| シ | 1 | 3.6% | |
| ・ | 1 | 3.6% | |
| ゴ | 1 | 3.6% | |
| ジ | 1 | 3.6% | |
| ポ | 1 | 3.6% | |
| ニ | 1 | 3.6% | |
| ョ | 1 | 3.6% | |
| ャ | 1 | 3.6% | |
| プ | 1 | 3.6% | |
| テ | 1 | 3.6% | |
| ー | 1 | 3.6% | |
| ロ | 1 | 3.6% | |
| ッ | 1 | 3.6% | |
| ク | 1 | 3.6% | |
| ウ | 1 | 3.6% | |
| ル | 1 | 3.6% | |
| ュ | 1 | 3.6% |
Most frequent CJK characters
| Value | Count | Frequency (%) | |
| 三 | 5 | 3.1% | |
| 城 | 4 | 2.5% | |
| 的 | 4 | 2.5% | |
| 十 | 3 | 1.9% | |
| 一 | 3 | 1.9% | |
| 人 | 3 | 1.9% | |
| 金 | 2 | 1.2% | |
| 天 | 2 | 1.2% | |
| 雄 | 2 | 1.2% | |
| 師 | 2 | 1.2% | |
| 记 | 2 | 1.2% | |
| 之 | 2 | 1.2% | |
| 黃 | 2 | 1.2% | |
| 甲 | 2 | 1.2% | |
| 個 | 2 | 1.2% | |
| 林 | 2 | 1.2% | |
| 石 | 2 | 1.2% | |
| 七 | 2 | 1.2% | |
| 龙 | 2 | 1.2% | |
| 千 | 2 | 1.2% | |
| 風 | 2 | 1.2% | |
| 暴 | 2 | 1.2% | |
| 奇 | 2 | 1.2% | |
| 南 | 2 | 1.2% | |
| 京 | 2 | 1.2% | |
| Other values (98) | 100 | 62.5% |
Most frequent Cyrillic characters
| Value | Count | Frequency (%) | |
| о | 18 | 14.2% | |
| а | 10 | 7.9% | |
| е | 10 | 7.9% | |
| р | 9 | 7.1% | |
| н | 8 | 6.3% | |
| л | 7 | 5.5% | |
| и | 6 | 4.7% | |
| в | 6 | 4.7% | |
| С | 5 | 3.9% | |
| д | 4 | 3.1% | |
| к | 4 | 3.1% | |
| б | 4 | 3.1% | |
| г | 4 | 3.1% | |
| с | 3 | 2.4% | |
| з | 3 | 2.4% | |
| я | 2 | 1.6% | |
| т | 2 | 1.6% | |
| ы | 2 | 1.6% | |
| у | 2 | 1.6% | |
| п | 2 | 1.6% | |
| ц | 1 | 0.8% | |
| Б | 1 | 0.8% | |
| З | 1 | 0.8% | |
| ё | 1 | 0.8% | |
| М | 1 | 0.8% | |
| Other values (11) | 11 | 8.7% |
Most frequent Hiragana characters
| Value | Count | Frequency (%) | |
| の | 7 | 41.2% | |
| だ | 2 | 11.8% | |
| く | 1 | 5.9% | |
| も | 1 | 5.9% | |
| け | 1 | 5.9% | |
| と | 1 | 5.9% | |
| し | 1 | 5.9% | |
| ま | 1 | 5.9% | |
| あ | 1 | 5.9% | |
| よ | 1 | 5.9% |
Most frequent Hangul characters
| Value | Count | Frequency (%) | |
| 놈 | 3 | 6.7% | |
| 상 | 2 | 4.4% | |
| 이 | 2 | 4.4% | |
| 한 | 2 | 4.4% | |
| 디 | 1 | 2.2% | |
| 워 | 1 | 2.2% | |
| 해 | 1 | 2.2% | |
| 운 | 1 | 2.2% | |
| 대 | 1 | 2.2% | |
| 인 | 1 | 2.2% | |
| 천 | 1 | 2.2% | |
| 륙 | 1 | 2.2% | |
| 작 | 1 | 2.2% | |
| 전 | 1 | 2.2% | |
| 태 | 1 | 2.2% | |
| 극 | 1 | 2.2% | |
| 기 | 1 | 2.2% | |
| 휘 | 1 | 2.2% | |
| 날 | 1 | 2.2% | |
| 리 | 1 | 2.2% | |
| 며 | 1 | 2.2% | |
| 괴 | 1 | 2.2% | |
| 물 | 1 | 2.2% | |
| 좋 | 1 | 2.2% | |
| 은 | 1 | 2.2% | |
| Other values (15) | 15 | 33.3% |
Most frequent Number Forms characters
| Value | Count | Frequency (%) | |
| ⅓ | 1 | 100.0% |
Most frequent Tamil characters
| Value | Count | Frequency (%) | |
| ா | 3 | 23.1% | |
| ன | 2 | 15.4% | |
| ு | 2 | 15.4% | |
| ர | 1 | 7.7% | |
| ம | 1 | 7.7% | |
| ஜ | 1 | 7.7% | |
| ் | 1 | 7.7% | |
| வ | 1 | 7.7% | |
| ல | 1 | 7.7% |
Most frequent Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 3 | 42.9% | |
| – | 2 | 28.6% | |
| ‧ | 1 | 14.3% | |
| | 1 | 14.3% |
Most frequent Devanagari characters
| Value | Count | Frequency (%) | |
| ी | 5 | 14.3% | |
| क | 4 | 11.4% | |
| ा | 3 | 8.6% | |
| भ | 2 | 5.7% | |
| ल | 2 | 5.7% | |
| ि | 2 | 5.7% | |
| द | 2 | 5.7% | |
| न | 2 | 5.7% | |
| ह | 2 | 5.7% | |
| अ | 1 | 2.9% | |
| व | 1 | 2.9% | |
| ृ | 1 | 2.9% | |
| ष | 1 | 2.9% | |
| ए | 1 | 2.9% | |
| ब | 1 | 2.9% | |
| स | 1 | 2.9% | |
| ड | 1 | 2.9% | |
| ज | 1 | 2.9% | |
| ो | 1 | 2.9% | |
| े | 1 | 2.9% |
Most frequent Thai characters
| Value | Count | Frequency (%) | |
| ุ | 2 | 11.8% | |
| ย | 2 | 11.8% | |
| ้ | 2 | 11.8% | |
| ส | 1 | 5.9% | |
| ร | 1 | 5.9% | |
| ิ | 1 | 5.9% | |
| โ | 1 | 5.9% | |
| ไ | 1 | 5.9% | |
| ท | 1 | 5.9% | |
| ต | 1 | 5.9% | |
| ม | 1 | 5.9% | |
| ำ | 1 | 5.9% | |
| ก | 1 | 5.9% | |
| ง | 1 | 5.9% |
Most frequent Arabic characters
| Value | Count | Frequency (%) | |
| ا | 7 | 17.9% | |
| ی | 6 | 15.4% | |
| ن | 4 | 10.3% | |
| س | 3 | 7.7% | |
| ر | 3 | 7.7% | |
| د | 3 | 7.7% | |
| ه | 3 | 7.7% | |
| ب | 2 | 5.1% | |
| م | 2 | 5.1% | |
| ك | 1 | 2.6% | |
| ت | 1 | 2.6% | |
| ج | 1 | 2.6% | |
| ز | 1 | 2.6% | |
| چ | 1 | 2.6% | |
| آ | 1 | 2.6% |
Most frequent Latin Ext Additional characters
| Value | Count | Frequency (%) | |
| ợ | 1 | 100.0% |
| Distinct | 4801 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| UNK | 3 |
|---|---|
| It's been many years since Freddy Krueger's first victim, Nancy, came face-to-face with Freddy and his sadistic, evil ways. Now, Nancy's all grown up; she's put her frightening nightmares behind her and is helping teens cope with their dreams. Too bad Freddy's decided to herald his return by invading the kids' dreams and scaring them into committing suicide. | 1 |
| A Savage beast, grown to monstrous size and driven mad by toxic wastes that are poisoning the waters, spreads terror and death on a Maine countryside. | 1 |
| The characters we met a little more than a decade ago are returning to East Great Falls for their high-school reunion. In one long-overdue weekend, they will discover what has changed, who hasn’t and that time and distance can’t break the bonds of friendship. It was summer 1999 when four small-town Michigan boys began a quest to lose their virginity. In the years that have passed, Jim and Michelle married while Kevin and Vicky said goodbye. Oz and Heather grew apart, but Finch still longs for Stifler’s mom. Now these lifelong friends have come home as adults to reminisce about – and get inspired by – the hormonal teens who launched a comedy legend. | 1 |
| Arrogant, self-centered movie director Guido Contini finds himself struggling to find meaning, purpose, and a script for his latest film endeavor. With only a week left before shooting begins, he desperately searches for answers and inspiration from his wife, his mistress, his muse, and his mother. | 1 |
| Other values (4796) |
| Value | Count | Frequency (%) | |
| UNK | 3 | 0.1% | |
| It's been many years since Freddy Krueger's first victim, Nancy, came face-to-face with Freddy and his sadistic, evil ways. Now, Nancy's all grown up; she's put her frightening nightmares behind her and is helping teens cope with their dreams. Too bad Freddy's decided to herald his return by invading the kids' dreams and scaring them into committing suicide. | 1 | < 0.1% | |
| A Savage beast, grown to monstrous size and driven mad by toxic wastes that are poisoning the waters, spreads terror and death on a Maine countryside. | 1 | < 0.1% | |
| The characters we met a little more than a decade ago are returning to East Great Falls for their high-school reunion. In one long-overdue weekend, they will discover what has changed, who hasn’t and that time and distance can’t break the bonds of friendship. It was summer 1999 when four small-town Michigan boys began a quest to lose their virginity. In the years that have passed, Jim and Michelle married while Kevin and Vicky said goodbye. Oz and Heather grew apart, but Finch still longs for Stifler’s mom. Now these lifelong friends have come home as adults to reminisce about – and get inspired by – the hormonal teens who launched a comedy legend. | 1 | < 0.1% | |
| Arrogant, self-centered movie director Guido Contini finds himself struggling to find meaning, purpose, and a script for his latest film endeavor. With only a week left before shooting begins, he desperately searches for answers and inspiration from his wife, his mistress, his muse, and his mother. | 1 | < 0.1% | |
| On the night of 16 July 1942, ten year old Sarah and her parents are being arrested and transported to the Velodrome d'Hiver in Paris where thousands of other jews are being send to get deported. Sarah however managed to lock her little brother in a closed just before the police entered their appartment.Sixty years later, Julia Jarmond, an American journalist in Paris, gets the assignment to write an article about this raid, a black page in the history of France. She starts digging archives and through Sarah's file discovers a well kept secret about her own in-laws. | 1 | < 0.1% | |
| The true story of British athletes preparing for and competing in the 1924 Summer Olympics. | 1 | < 0.1% | |
| Summertime on the coast of Maine, "In the Bedroom" centers on the inner dynamics of a family in transition. Matt Fowler is a doctor practicing in his native Maine and is married to New York born Ruth Fowler, a music teacher. He is involved in a love affair with a local single mother. As the beauty of Maine's brief and fleeting summer comes to an end, these characters find themselves in the midst of unimaginable tragedy. | 1 | < 0.1% | |
| We always knew they were coming back. Using recovered alien technology, the nations of Earth have collaborated on an immense defense program to protect the planet. But nothing can prepare us for the aliens’ advanced and unprecedented force. Only the ingenuity of a few brave men and women can bring our world back from the brink of extinction. | 1 | < 0.1% | |
| Manhattanite Ashley is known to many as the luckiest woman around. After a chance encounter with a down-and-out young man, however, she realizes that she's swapped her fortune for his. | 1 | < 0.1% | |
| Medical researcher Frank, his fiancee Zoe and their team have achieved the impossible: they have found a way to revive the dead. After a successful, but unsanctioned, experiment on a lifeless animal, they are ready to make their work public. However, when their dean learns what they've done, he shuts them down. Zoe is killed during an attempt to recreate the experiment, leading Frank to test the process on her. Zoe is revived -- but something evil is within her. | 1 | < 0.1% | |
| Jason Kelly is one week away from marrying his boss's uber-controlling daughter, putting him on the fast track for a partnership at the law firm. However, when the straight-laced Jason is tricked into driving his foul-mouthed grandfather, Dick, to Daytona for spring break, his pending nuptials are suddenly in jeopardy. Between riotous frat parties, bar fights, and an epic night of karaoke, Dick is on a quest to live his life to the fullest and bring Jason along for the ride. | 1 | < 0.1% | |
| A sparkling comedic chronicle of a middle-class young man’s romantic misadventures among New York City’s debutante society. Stillman’s deft, literate dialogue and hilariously highbrow observations earned this debut film an Academy Award nomination for Best Original Screenplay. Alongside the wit and sophistication, though, lies a tender tale of adolescent anxiety. | 1 | < 0.1% | |
| A rich college kid is taught a lesson after a joy ride ends up destroying a country restaurant. | 1 | < 0.1% | |
| In Colombia just after the Great War, an old man falls from a ladder; dying, he professes great love for his wife. After the funeral, a man calls on the widow - she dismisses him angrily. Flash back more than 50 years to the day Florentino Ariza, a telegraph boy, falls in love with Fermina Daza, the daughter of a mule trader. | 1 | < 0.1% | |
| In a beauty salon in Beirut the lives of five women cross paths. The beauty salon is a colorful and sensual microcosm where they share and entrust their hopes, fears and expectations. | 1 | < 0.1% | |
| When The Man in the Yellow Hat befriends Curious George in the jungle, they set off on a non-stop, fun-filled journey through the wonders of the big city toward the warmth of true friendship. | 1 | < 0.1% | |
| Set in the late 19th century. When a ruthless robber baron takes away everything they cherish, a rough-and-tumble, idealistic peasant and a sophisticated heiress embark on a quest for justice, vengeance…and a few good heists. | 1 | < 0.1% | |
| A timid magazine photo manager who lives life vicariously through daydreams embarks on a true-life adventure when a negative goes missing. | 1 | < 0.1% | |
| Katherine Morrissey, a former Christian missionary, lost her faith after the tragic deaths of her family. Now she applies her expertise to debunking religious phenomena. When a series of biblical plagues overrun a small town, Katherine arrives to prove that a supernatural force is not behind the occurrences, but soon finds that science cannot explain what is happening. Instead, she must regain her faith to combat the evil that waits in a Louisiana swamp. | 1 | < 0.1% | |
| Raju, a waiter, is in love with the famous TV reporter Greeta Kapoor. After a man is murdered, Kapoor shows up at Raju's door to ask him some questions - it turns out that Raju served the dead man his last supper, and the authorities hope that he might be able to help them. Raju lies and says that he was an eye witness, in order to spend more time with Kapoor. He gives the police a false description of the killer, but it matches his best friend Kutti, so soon Kutti is wanted by the police, and the Mafia, who is responsible for the killing, is after Raju. | 1 | < 0.1% | |
| A law firm brings in its 'fixer' to remedy the situation after a lawyer has a breakdown while representing a chemical company that he knows is guilty in a multi-billion dollar class action suit. | 1 | < 0.1% | |
| When Dustin's girlfriend, Alexis, breaks up with him, he employs his best buddy, Tank, to take her out on the worst rebound date imaginable in the hopes that it will send her running back into his arms. But when Tank begins to really fall for Alexis, he finds himself in an impossible position. | 1 | < 0.1% | |
| Fugitives of the Federation for their daring rescue of Spock from the doomed Genesis Planet, Admiral Kirk (William Shatner) and his crew begin their journey home to face justice for their actions. But as they near Earth, they find it at the mercy of a mysterious alien presence whose signals are slowly destroying the planet. In a desperate attempt to answer the call of the probe, Kirk and his crew race back to the late twentieth century. However they soon find the world they once knew to be more alien than anything they've encountered in the far reaches of the galaxy! | 1 | < 0.1% | |
| Pete Sandidge (Tracy), a daredevil bomber pilot, dies when he crashes his plane into a German aircraft carrier, leaving his devoted girlfriend, Dorinda (Irene Dunne), who is also a pilot, heartbroken. In heaven, Pete receives a new assignment: he is to become the guardian angel for Ted Randall (Van Johnson), a young Army flyer. Invisibly, Pete guides Ted through flight school and into combat, but the ectoplasmic mentor's tolerance is tested when Ted falls for Dorinda. Ultimately, however, Pete not only comes to terms with their relationship but also acts as Dorinda's copilot when she undertakes a dangerous bombing raid, so that Ted won't have to. Remade by Steven Speilberg in 1989 as ALWAYS | 1 | < 0.1% | |
| Other values (4776) | 4776 | 99.4% |
Frequencies of value counts
Unique
| Unique | 4800 ? |
|---|---|
| Unique (%) | 99.9% |
Histogram of lengths of the category
Length
| Max length | 1000 |
|---|---|
| Median length | 283 |
| Mean length | 305.2098688 |
| Min length | 1 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 245712 | 16.8% | ||
| e | 141178 | 9.6% | |
| t | 97104 | 6.6% | |
| a | 94907 | 6.5% | |
| i | 85329 | 5.8% | |
| o | 84373 | 5.8% | |
| n | 84015 | 5.7% | |
| s | 78343 | 5.3% | |
| r | 77154 | 5.3% | |
| h | 61948 | 4.2% | |
| l | 48681 | 3.3% | |
| d | 41552 | 2.8% | |
| c | 32477 | 2.2% | |
| u | 29838 | 2.0% | |
| m | 28240 | 1.9% | |
| f | 26398 | 1.8% | |
| g | 25430 | 1.7% | |
| y | 20387 | 1.4% | |
| p | 20108 | 1.4% | |
| w | 19108 | 1.3% | |
| b | 15814 | 1.1% | |
| , | 13388 | 0.9% | |
| v | 12791 | 0.9% | |
| . | 12056 | 0.8% | |
| k | 9150 | 0.6% | |
| Other values (102) | 60442 | 4.1% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 1139845 | 77.8% | |
| Space Separator | 245716 | 16.8% | |
| Uppercase Letter | 39048 | 2.7% | |
| Other Punctuation | 30988 | 2.1% | |
| Dash Punctuation | 4474 | 0.3% | |
| Decimal Number | 3915 | 0.3% | |
| Open Punctuation | 757 | 0.1% | |
| Close Punctuation | 754 | 0.1% | |
| Final Punctuation | 310 | < 0.1% | |
| Initial Punctuation | 49 | < 0.1% | |
| Currency Symbol | 46 | < 0.1% | |
| Math Symbol | 5 | < 0.1% | |
| Other Symbol | 4 | < 0.1% | |
| Connector Punctuation | 3 | < 0.1% | |
| Format | 3 | < 0.1% | |
| Control | 3 | < 0.1% | |
| Other Number | 1 | < 0.1% | |
| Modifier Letter | 1 | < 0.1% | |
| Modifier Symbol | 1 | < 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| A | 4448 | 11.4% | |
| T | 3268 | 8.4% | |
| S | 2934 | 7.5% | |
| B | 2671 | 6.8% | |
| C | 2467 | 6.3% | |
| M | 2307 | 5.9% | |
| W | 2098 | 5.4% | |
| H | 1808 | 4.6% | |
| D | 1682 | 4.3% | |
| I | 1604 | 4.1% | |
| J | 1591 | 4.1% | |
| R | 1460 | 3.7% | |
| L | 1394 | 3.6% | |
| P | 1321 | 3.4% | |
| F | 1315 | 3.4% | |
| E | 1209 | 3.1% | |
| N | 1161 | 3.0% | |
| G | 1052 | 2.7% | |
| K | 847 | 2.2% | |
| O | 688 | 1.8% | |
| V | 535 | 1.4% | |
| U | 489 | 1.3% | |
| Y | 413 | 1.1% | |
| Z | 128 | 0.3% | |
| Q | 110 | 0.3% | |
| Other values (3) | 48 | 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 141178 | 12.4% | |
| t | 97104 | 8.5% | |
| a | 94907 | 8.3% | |
| i | 85329 | 7.5% | |
| o | 84373 | 7.4% | |
| n | 84015 | 7.4% | |
| s | 78343 | 6.9% | |
| r | 77154 | 6.8% | |
| h | 61948 | 5.4% | |
| l | 48681 | 4.3% | |
| d | 41552 | 3.6% | |
| c | 32477 | 2.8% | |
| u | 29838 | 2.6% | |
| m | 28240 | 2.5% | |
| f | 26398 | 2.3% | |
| g | 25430 | 2.2% | |
| y | 20387 | 1.8% | |
| p | 20108 | 1.8% | |
| w | 19108 | 1.7% | |
| b | 15814 | 1.4% | |
| v | 12791 | 1.1% | |
| k | 9150 | 0.8% | |
| x | 2113 | 0.2% | |
| j | 1299 | 0.1% | |
| z | 1196 | 0.1% | |
| Other values (21) | 912 | 0.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 245712 | > 99.9% | ||
| 4 | < 0.1% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 1 | 926 | 23.7% | |
| 0 | 821 | 21.0% | |
| 9 | 568 | 14.5% | |
| 2 | 371 | 9.5% | |
| 5 | 231 | 5.9% | |
| 7 | 224 | 5.7% | |
| 8 | 206 | 5.3% | |
| 3 | 199 | 5.1% | |
| 4 | 189 | 4.8% | |
| 6 | 180 | 4.6% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| , | 13388 | 43.2% | |
| . | 12056 | 38.9% | |
| ' | 3577 | 11.5% | |
| " | 1021 | 3.3% | |
| : | 284 | 0.9% | |
| ? | 218 | 0.7% | |
| ; | 175 | 0.6% | |
| ! | 142 | 0.5% | |
| / | 57 | 0.2% | |
| … | 35 | 0.1% | |
| & | 29 | 0.1% | |
| · | 2 | < 0.1% | |
| # | 2 | < 0.1% | |
| ¡ | 1 | < 0.1% | |
| % | 1 | < 0.1% |
Most frequent Final Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 275 | 88.7% | |
| ” | 35 | 11.3% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 4206 | 94.0% | |
| – | 205 | 4.6% | |
| — | 62 | 1.4% | |
| ― | 1 | < 0.1% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| ( | 755 | 99.7% | |
| [ | 2 | 0.3% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| ) | 753 | 99.9% | |
| ] | 1 | 0.1% |
Most frequent Initial Punctuation characters
| Value | Count | Frequency (%) | |
| “ | 35 | 71.4% | |
| ‘ | 14 | 28.6% |
Most frequent Other Number characters
| Value | Count | Frequency (%) | |
| ¹ | 1 | 100.0% |
Most frequent Currency Symbol characters
| Value | Count | Frequency (%) | |
| $ | 45 | 97.8% | |
| £ | 1 | 2.2% |
Most frequent Math Symbol characters
| Value | Count | Frequency (%) | |
| + | 2 | 40.0% | |
| − | 1 | 20.0% | |
| | | 1 | 20.0% | |
| ~ | 1 | 20.0% |
Most frequent Other Symbol characters
| Value | Count | Frequency (%) | |
| ® | 3 | 75.0% | |
| ¦ | 1 | 25.0% |
Most frequent Connector Punctuation characters
| Value | Count | Frequency (%) | |
| _ | 3 | 100.0% |
Most frequent Format characters
| Value | Count | Frequency (%) | |
| | 3 | 100.0% |
Most frequent Control characters
| Value | Count | Frequency (%) | |
| 3 | 100.0% |
Most frequent Modifier Letter characters
| Value | Count | Frequency (%) | |
| ʼ | 1 | 100.0% |
Most frequent Modifier Symbol characters
| Value | Count | Frequency (%) | |
| ` | 1 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 1178893 | 80.4% | |
| Common | 287030 | 19.6% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 141178 | 12.0% | |
| t | 97104 | 8.2% | |
| a | 94907 | 8.1% | |
| i | 85329 | 7.2% | |
| o | 84373 | 7.2% | |
| n | 84015 | 7.1% | |
| s | 78343 | 6.6% | |
| r | 77154 | 6.5% | |
| h | 61948 | 5.3% | |
| l | 48681 | 4.1% | |
| d | 41552 | 3.5% | |
| c | 32477 | 2.8% | |
| u | 29838 | 2.5% | |
| m | 28240 | 2.4% | |
| f | 26398 | 2.2% | |
| g | 25430 | 2.2% | |
| y | 20387 | 1.7% | |
| p | 20108 | 1.7% | |
| w | 19108 | 1.6% | |
| b | 15814 | 1.3% | |
| v | 12791 | 1.1% | |
| k | 9150 | 0.8% | |
| A | 4448 | 0.4% | |
| T | 3268 | 0.3% | |
| S | 2934 | 0.2% | |
| Other values (49) | 33918 | 2.9% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 245712 | 85.6% | ||
| , | 13388 | 4.7% | |
| . | 12056 | 4.2% | |
| - | 4206 | 1.5% | |
| ' | 3577 | 1.2% | |
| " | 1021 | 0.4% | |
| 1 | 926 | 0.3% | |
| 0 | 821 | 0.3% | |
| ( | 755 | 0.3% | |
| ) | 753 | 0.3% | |
| 9 | 568 | 0.2% | |
| 2 | 371 | 0.1% | |
| : | 284 | 0.1% | |
| ’ | 275 | 0.1% | |
| 5 | 231 | 0.1% | |
| 7 | 224 | 0.1% | |
| ? | 218 | 0.1% | |
| 8 | 206 | 0.1% | |
| – | 205 | 0.1% | |
| 3 | 199 | 0.1% | |
| 4 | 189 | 0.1% | |
| 6 | 180 | 0.1% | |
| ; | 175 | 0.1% | |
| ! | 142 | < 0.1% | |
| — | 62 | < 0.1% | |
| Other values (28) | 286 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 1465088 | 99.9% | |
| Punctuation | 662 | < 0.1% | |
| None | 170 | < 0.1% | |
| Alphabetic PF | 1 | < 0.1% | |
| Math Operators | 1 | < 0.1% | |
| Modifier Letters | 1 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 245712 | 16.8% | ||
| e | 141178 | 9.6% | |
| t | 97104 | 6.6% | |
| a | 94907 | 6.5% | |
| i | 85329 | 5.8% | |
| o | 84373 | 5.8% | |
| n | 84015 | 5.7% | |
| s | 78343 | 5.3% | |
| r | 77154 | 5.3% | |
| h | 61948 | 4.2% | |
| l | 48681 | 3.3% | |
| d | 41552 | 2.8% | |
| c | 32477 | 2.2% | |
| u | 29838 | 2.0% | |
| m | 28240 | 1.9% | |
| f | 26398 | 1.8% | |
| g | 25430 | 1.7% | |
| y | 20387 | 1.4% | |
| p | 20108 | 1.4% | |
| w | 19108 | 1.3% | |
| b | 15814 | 1.1% | |
| , | 13388 | 0.9% | |
| v | 12791 | 0.9% | |
| . | 12056 | 0.8% | |
| k | 9150 | 0.6% | |
| Other values (62) | 59607 | 4.1% |
Most frequent Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 275 | 41.5% | |
| – | 205 | 31.0% | |
| — | 62 | 9.4% | |
| … | 35 | 5.3% | |
| “ | 35 | 5.3% | |
| ” | 35 | 5.3% | |
| ‘ | 14 | 2.1% | |
| ― | 1 | 0.2% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| é | 92 | 54.1% | |
| á | 12 | 7.1% | |
| ó | 8 | 4.7% | |
| 4 | 2.4% | ||
| ï | 4 | 2.4% | |
| è | 4 | 2.4% | |
| ç | 4 | 2.4% | |
| ö | 4 | 2.4% | |
| í | 4 | 2.4% | |
| î | 3 | 1.8% | |
| ü | 3 | 1.8% | |
| à | 3 | 1.8% | |
| ñ | 3 | 1.8% | |
| ® | 3 | 1.8% | |
| | 3 | 1.8% | |
| · | 2 | 1.2% | |
| ø | 2 | 1.2% | |
| ¹ | 1 | 0.6% | |
| ô | 1 | 0.6% | |
| ë | 1 | 0.6% | |
| Æ | 1 | 0.6% | |
| Â | 1 | 0.6% | |
| ¡ | 1 | 0.6% | |
| ¦ | 1 | 0.6% | |
| £ | 1 | 0.6% | |
| Other values (4) | 4 | 2.4% |
Most frequent Alphabetic PF characters
| Value | Count | Frequency (%) | |
| fi | 1 | 100.0% |
Most frequent Math Operators characters
| Value | Count | Frequency (%) | |
| − | 1 | 100.0% |
Most frequent Modifier Letters characters
| Value | Count | Frequency (%) | |
| ʼ | 1 | 100.0% |
popularity
Real number (ℝ≥0)
| Distinct | 4802 |
|---|---|
| Distinct (%) | > 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 21.49230059 |
|---|---|
| Minimum | 0 |
| Maximum | 875.581305 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 37.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.3628167 |
| Q1 | 4.66807 |
| median | 12.921594 |
| Q3 | 28.3135045 |
| 95-th percentile | 67.3859622 |
| Maximum | 875.581305 |
| Range | 875.581305 |
| Interquartile range (IQR) | 23.6454345 |
Descriptive statistics
| Standard deviation | 31.81664975 |
|---|---|
| Coefficient of variation (CV) | 1.480374314 |
| Kurtosis | 191.9958205 |
| Mean | 21.49230059 |
| Median Absolute Deviation (MAD) | 9.814445 |
| Skewness | 9.721415886 |
| Sum | 103227.5197 |
| Variance | 1012.299201 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 8.902102 | 2 | < 0.1% | |
| 16.251204 | 1 | < 0.1% | |
| 18.47242 | 1 | < 0.1% | |
| 9.779444 | 1 | < 0.1% | |
| 10.142218 | 1 | < 0.1% | |
| 1.569246 | 1 | < 0.1% | |
| 27.65527 | 1 | < 0.1% | |
| 121.463076 | 1 | < 0.1% | |
| 16.032594 | 1 | < 0.1% | |
| 0.118324 | 1 | < 0.1% | |
| 80.171283 | 1 | < 0.1% | |
| 36.238968 | 1 | < 0.1% | |
| 24.657931 | 1 | < 0.1% | |
| 44.529429 | 1 | < 0.1% | |
| 8.265317 | 1 | < 0.1% | |
| 10.439971 | 1 | < 0.1% | |
| 93.067866 | 1 | < 0.1% | |
| 55.659988 | 1 | < 0.1% | |
| 33.649652 | 1 | < 0.1% | |
| 4.289003 | 1 | < 0.1% | |
| 0.887821 | 1 | < 0.1% | |
| 1.71729 | 1 | < 0.1% | |
| 39.004588 | 1 | < 0.1% | |
| 19.524972 | 1 | < 0.1% | |
| 1.777148 | 1 | < 0.1% | |
| Other values (4777) | 4777 | 99.5% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 0.000372 | 1 | < 0.1% | |
| 0.001117 | 1 | < 0.1% | |
| 0.001186 | 1 | < 0.1% | |
| 0.001389 | 1 | < 0.1% | |
| 0.001586 | 1 | < 0.1% | |
| 0.002386 | 1 | < 0.1% | |
| 0.002388 | 1 | < 0.1% | |
| 0.003142 | 1 | < 0.1% | |
| 0.003352 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 875.581305 | 1 | < 0.1% | |
| 724.247784 | 1 | < 0.1% | |
| 514.569956 | 1 | < 0.1% | |
| 481.098624 | 1 | < 0.1% | |
| 434.278564 | 1 | < 0.1% | |
| 418.708552 | 1 | < 0.1% | |
| 271.972889 | 1 | < 0.1% | |
| 243.791743 | 1 | < 0.1% | |
| 206.227151 | 1 | < 0.1% | |
| 203.73459 | 1 | < 0.1% |
| Distinct | 3697 |
|---|---|
| Distinct (%) | 77.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| [] | 351 |
|---|---|
| [{'name': 'Paramount Pictures', 'id': 4}] | 58 |
| [{'name': 'Universal Pictures', 'id': 33}] | 45 |
| [{'name': 'New Line Cinema', 'id': 12}] | 38 |
| [{'name': 'Columbia Pictures', 'id': 5}] | 37 |
| Other values (3692) |
| Value | Count | Frequency (%) | |
| [] | 351 | 7.3% | |
| [{'name': 'Paramount Pictures', 'id': 4}] | 58 | 1.2% | |
| [{'name': 'Universal Pictures', 'id': 33}] | 45 | 0.9% | |
| [{'name': 'New Line Cinema', 'id': 12}] | 38 | 0.8% | |
| [{'name': 'Columbia Pictures', 'id': 5}] | 37 | 0.8% | |
| [{'name': 'Metro-Goldwyn-Mayer (MGM)', 'id': 8411}] | 32 | 0.7% | |
| [{'name': 'Twentieth Century Fox Film Corporation', 'id': 306}] | 31 | 0.6% | |
| [{'name': 'Warner Bros.', 'id': 6194}] | 27 | 0.6% | |
| [{'name': 'Walt Disney Pictures', 'id': 2}] | 27 | 0.6% | |
| [{'name': 'Touchstone Pictures', 'id': 9195}] | 26 | 0.5% | |
| [{'name': 'Dimension Films', 'id': 7405}] | 17 | 0.4% | |
| [{'name': 'Miramax Films', 'id': 14}] | 16 | 0.3% | |
| [{'name': 'Columbia Pictures Corporation', 'id': 441}] | 16 | 0.3% | |
| [{'name': 'DreamWorks Animation', 'id': 521}] | 12 | 0.2% | |
| [{'name': 'United Artists', 'id': 60}] | 12 | 0.2% | |
| [{'name': 'Walt Disney Pictures', 'id': 2}, {'name': 'Pixar Animation Studios', 'id': 3}] | 11 | 0.2% | |
| [{'name': 'Fox 2000 Pictures', 'id': 711}] | 10 | 0.2% | |
| [{'name': 'Fox Searchlight Pictures', 'id': 43}] | 9 | 0.2% | |
| [{'name': 'Imagine Entertainment', 'id': 23}, {'name': 'Universal Pictures', 'id': 33}] | 9 | 0.2% | |
| [{'name': 'Walt Disney Pictures', 'id': 2}, {'name': 'Walt Disney Feature Animation', 'id': 10217}] | 9 | 0.2% | |
| [{'name': 'Lions Gate Films', 'id': 35}] | 8 | 0.2% | |
| [{'name': 'Blue Sky Studios', 'id': 9383}, {'name': 'Twentieth Century Fox Animation', 'id': 11749}] | 8 | 0.2% | |
| [{'name': 'Marvel Studios', 'id': 420}] | 8 | 0.2% | |
| [{'name': 'United Artists', 'id': 60}, {'name': 'Eon Productions', 'id': 7576}, {'name': 'Danjaq', 'id': 10761}] | 7 | 0.1% | |
| [{'name': 'Hollywood Pictures', 'id': 915}, {'name': 'Cinergi Pictures Entertainment', 'id': 1504}] | 7 | 0.1% | |
| Other values (3672) | 3972 | 82.7% |
Frequencies of value counts
Unique
| Unique | 3497 ? |
|---|---|
| Unique (%) | 72.8% |
Histogram of lengths of the category
Length
| Max length | 1130 |
|---|---|
| Median length | 104 |
| Mean length | 127.542994 |
| Min length | 2 |
Most occurring characters
| Value | Count | Frequency (%) | |
| ' | 81993 | 13.4% | |
| 70935 | 11.6% | ||
| i | 34526 | 5.6% | |
| e | 33841 | 5.5% | |
| n | 32605 | 5.3% | |
| a | 28312 | 4.6% | |
| : | 27355 | 4.5% | |
| , | 22976 | 3.8% | |
| m | 22254 | 3.6% | |
| d | 19859 | 3.2% | |
| t | 18386 | 3.0% | |
| r | 17181 | 2.8% | |
| o | 15560 | 2.5% | |
| { | 13677 | 2.2% | |
| } | 13677 | 2.2% | |
| s | 12762 | 2.1% | |
| l | 9347 | 1.5% | |
| u | 9295 | 1.5% | |
| 1 | 8424 | 1.4% | |
| c | 7557 | 1.2% | |
| 2 | 6454 | 1.1% | |
| P | 6167 | 1.0% | |
| 3 | 5935 | 1.0% | |
| 4 | 5520 | 0.9% | |
| 5 | 4941 | 0.8% | |
| Other values (81) | 83050 | 13.6% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 278719 | 45.5% | |
| Other Punctuation | 133646 | 21.8% | |
| Space Separator | 70935 | 11.6% | |
| Decimal Number | 53365 | 8.7% | |
| Uppercase Letter | 37442 | 6.1% | |
| Open Punctuation | 18933 | 3.1% | |
| Close Punctuation | 18933 | 3.1% | |
| Dash Punctuation | 500 | 0.1% | |
| Math Symbol | 113 | < 0.1% | |
| Other Number | 2 | < 0.1% | |
| Other Symbol | 1 | < 0.1% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| { | 13677 | 72.2% | |
| [ | 4803 | 25.4% | |
| ( | 453 | 2.4% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| ' | 81993 | 61.4% | |
| : | 27355 | 20.5% | |
| , | 22976 | 17.2% | |
| . | 835 | 0.6% | |
| / | 176 | 0.1% | |
| & | 161 | 0.1% | |
| " | 142 | 0.1% | |
| ! | 4 | < 0.1% | |
| @ | 2 | < 0.1% | |
| ? | 1 | < 0.1% | |
| % | 1 | < 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| i | 34526 | 12.4% | |
| e | 33841 | 12.1% | |
| n | 32605 | 11.7% | |
| a | 28312 | 10.2% | |
| m | 22254 | 8.0% | |
| d | 19859 | 7.1% | |
| t | 18386 | 6.6% | |
| r | 17181 | 6.2% | |
| o | 15560 | 5.6% | |
| s | 12762 | 4.6% | |
| l | 9347 | 3.4% | |
| u | 9295 | 3.3% | |
| c | 7557 | 2.7% | |
| y | 2648 | 1.0% | |
| h | 2520 | 0.9% | |
| p | 2387 | 0.9% | |
| g | 2099 | 0.8% | |
| v | 1478 | 0.5% | |
| k | 1396 | 0.5% | |
| w | 1301 | 0.5% | |
| b | 1206 | 0.4% | |
| f | 702 | 0.3% | |
| x | 699 | 0.3% | |
| é | 294 | 0.1% | |
| z | 195 | 0.1% | |
| Other values (18) | 309 | 0.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 70935 | 100.0% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| P | 6167 | 16.5% | |
| F | 4436 | 11.8% | |
| C | 3700 | 9.9% | |
| M | 2429 | 6.5% | |
| E | 2337 | 6.2% | |
| S | 2241 | 6.0% | |
| T | 1644 | 4.4% | |
| B | 1566 | 4.2% | |
| A | 1427 | 3.8% | |
| G | 1409 | 3.8% | |
| D | 1353 | 3.6% | |
| W | 1252 | 3.3% | |
| I | 1223 | 3.3% | |
| R | 1156 | 3.1% | |
| L | 1114 | 3.0% | |
| N | 763 | 2.0% | |
| H | 662 | 1.8% | |
| U | 561 | 1.5% | |
| V | 547 | 1.5% | |
| K | 516 | 1.4% | |
| O | 381 | 1.0% | |
| J | 229 | 0.6% | |
| Z | 140 | 0.4% | |
| Y | 88 | 0.2% | |
| Q | 45 | 0.1% | |
| Other values (5) | 56 | 0.1% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 1 | 8424 | 15.8% | |
| 2 | 6454 | 12.1% | |
| 3 | 5935 | 11.1% | |
| 4 | 5520 | 10.3% | |
| 5 | 4941 | 9.3% | |
| 6 | 4683 | 8.8% | |
| 0 | 4498 | 8.4% | |
| 7 | 4346 | 8.1% | |
| 8 | 4291 | 8.0% | |
| 9 | 4273 | 8.0% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| } | 13677 | 72.2% | |
| ] | 4803 | 25.4% | |
| ) | 453 | 2.4% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 500 | 100.0% |
Most frequent Other Number characters
| Value | Count | Frequency (%) | |
| ² | 1 | 50.0% | |
| ½ | 1 | 50.0% |
Most frequent Math Symbol characters
| Value | Count | Frequency (%) | |
| + | 113 | 100.0% |
Most frequent Other Symbol characters
| Value | Count | Frequency (%) | |
| ° | 1 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 316161 | 51.6% | |
| Common | 296428 | 48.4% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| ' | 81993 | 27.7% | |
| 70935 | 23.9% | ||
| : | 27355 | 9.2% | |
| , | 22976 | 7.8% | |
| { | 13677 | 4.6% | |
| } | 13677 | 4.6% | |
| 1 | 8424 | 2.8% | |
| 2 | 6454 | 2.2% | |
| 3 | 5935 | 2.0% | |
| 4 | 5520 | 1.9% | |
| 5 | 4941 | 1.7% | |
| [ | 4803 | 1.6% | |
| ] | 4803 | 1.6% | |
| 6 | 4683 | 1.6% | |
| 0 | 4498 | 1.5% | |
| 7 | 4346 | 1.5% | |
| 8 | 4291 | 1.4% | |
| 9 | 4273 | 1.4% | |
| . | 835 | 0.3% | |
| - | 500 | 0.2% | |
| ( | 453 | 0.2% | |
| ) | 453 | 0.2% | |
| / | 176 | 0.1% | |
| & | 161 | 0.1% | |
| " | 142 | < 0.1% | |
| Other values (8) | 124 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| i | 34526 | 10.9% | |
| e | 33841 | 10.7% | |
| n | 32605 | 10.3% | |
| a | 28312 | 9.0% | |
| m | 22254 | 7.0% | |
| d | 19859 | 6.3% | |
| t | 18386 | 5.8% | |
| r | 17181 | 5.4% | |
| o | 15560 | 4.9% | |
| s | 12762 | 4.0% | |
| l | 9347 | 3.0% | |
| u | 9295 | 2.9% | |
| c | 7557 | 2.4% | |
| P | 6167 | 2.0% | |
| F | 4436 | 1.4% | |
| C | 3700 | 1.2% | |
| y | 2648 | 0.8% | |
| h | 2520 | 0.8% | |
| M | 2429 | 0.8% | |
| p | 2387 | 0.8% | |
| E | 2337 | 0.7% | |
| S | 2241 | 0.7% | |
| g | 2099 | 0.7% | |
| T | 1644 | 0.5% | |
| B | 1566 | 0.5% | |
| Other values (48) | 20502 | 6.5% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 612136 | 99.9% | |
| None | 453 | 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| ' | 81993 | 13.4% | |
| 70935 | 11.6% | ||
| i | 34526 | 5.6% | |
| e | 33841 | 5.5% | |
| n | 32605 | 5.3% | |
| a | 28312 | 4.6% | |
| : | 27355 | 4.5% | |
| , | 22976 | 3.8% | |
| m | 22254 | 3.6% | |
| d | 19859 | 3.2% | |
| t | 18386 | 3.0% | |
| r | 17181 | 2.8% | |
| o | 15560 | 2.5% | |
| { | 13677 | 2.2% | |
| } | 13677 | 2.2% | |
| s | 12762 | 2.1% | |
| l | 9347 | 1.5% | |
| u | 9295 | 1.5% | |
| 1 | 8424 | 1.4% | |
| c | 7557 | 1.2% | |
| 2 | 6454 | 1.1% | |
| P | 6167 | 1.0% | |
| 3 | 5935 | 1.0% | |
| 4 | 5520 | 0.9% | |
| 5 | 4941 | 0.8% | |
| Other values (57) | 82597 | 13.5% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| é | 294 | 64.9% | |
| ó | 30 | 6.6% | |
| í | 17 | 3.8% | |
| ö | 16 | 3.5% | |
| ñ | 15 | 3.3% | |
| è | 12 | 2.6% | |
| á | 12 | 2.6% | |
| ä | 11 | 2.4% | |
| É | 10 | 2.2% | |
| ü | 10 | 2.2% | |
| ô | 4 | 0.9% | |
| ç | 4 | 0.9% | |
| ã | 3 | 0.7% | |
| ú | 3 | 0.7% | |
| à | 2 | 0.4% | |
| õ | 2 | 0.4% | |
| ² | 1 | 0.2% | |
| ï | 1 | 0.2% | |
| Î | 1 | 0.2% | |
| ° | 1 | 0.2% | |
| Ö | 1 | 0.2% | |
| ½ | 1 | 0.2% | |
| ě | 1 | 0.2% | |
| Á | 1 | 0.2% |
| Distinct | 469 |
|---|---|
| Distinct (%) | 9.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| [{'iso_3166_1': 'US', 'name': 'United States of America'}] | |
|---|---|
| [{'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 181 |
| [] | 174 |
| [{'iso_3166_1': 'GB', 'name': 'United Kingdom'}] | 131 |
| [{'iso_3166_1': 'DE', 'name': 'Germany'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 119 |
| Other values (464) |
| Value | Count | Frequency (%) | |
| [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2977 | 62.0% | |
| [{'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 181 | 3.8% | |
| [] | 174 | 3.6% | |
| [{'iso_3166_1': 'GB', 'name': 'United Kingdom'}] | 131 | 2.7% | |
| [{'iso_3166_1': 'DE', 'name': 'Germany'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 119 | 2.5% | |
| [{'iso_3166_1': 'CA', 'name': 'Canada'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 88 | 1.8% | |
| [{'iso_3166_1': 'FR', 'name': 'France'}] | 49 | 1.0% | |
| [{'iso_3166_1': 'AU', 'name': 'Australia'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 46 | 1.0% | |
| [{'iso_3166_1': 'CA', 'name': 'Canada'}] | 46 | 1.0% | |
| [{'iso_3166_1': 'FR', 'name': 'France'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 38 | 0.8% | |
| [{'iso_3166_1': 'DE', 'name': 'Germany'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 33 | 0.7% | |
| [{'iso_3166_1': 'IN', 'name': 'India'}] | 24 | 0.5% | |
| [{'iso_3166_1': 'AU', 'name': 'Australia'}] | 21 | 0.4% | |
| [{'iso_3166_1': 'FR', 'name': 'France'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 17 | 0.4% | |
| [{'iso_3166_1': 'JP', 'name': 'Japan'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 15 | 0.3% | |
| [{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'DE', 'name': 'Germany'}] | 15 | 0.3% | |
| [{'iso_3166_1': 'DE', 'name': 'Germany'}] | 15 | 0.3% | |
| [{'iso_3166_1': 'JP', 'name': 'Japan'}] | 15 | 0.3% | |
| [{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}] | 15 | 0.3% | |
| [{'iso_3166_1': 'FR', 'name': 'France'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}] | 14 | 0.3% | |
| [{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'CA', 'name': 'Canada'}] | 14 | 0.3% | |
| [{'iso_3166_1': 'CA', 'name': 'Canada'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}] | 13 | 0.3% | |
| [{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'AU', 'name': 'Australia'}] | 12 | 0.2% | |
| [{'iso_3166_1': 'CN', 'name': 'China'}, {'iso_3166_1': 'HK', 'name': 'Hong Kong'}] | 11 | 0.2% | |
| [{'iso_3166_1': 'KR', 'name': 'South Korea'}] | 10 | 0.2% | |
| Other values (444) | 710 | 14.8% |
Frequencies of value counts
Unique
| Unique | 353 ? |
|---|---|
| Unique (%) | 7.3% |
Histogram of lengths of the category
Length
| Max length | 517 |
|---|---|
| Median length | 58 |
| Mean length | 69.92129919 |
| Min length | 2 |
Most occurring characters
| Value | Count | Frequency (%) | |
| ' | 51488 | 15.3% | |
| 33793 | 10.1% | ||
| e | 19992 | 6.0% | |
| a | 16820 | 5.0% | |
| i | 16187 | 4.8% | |
| n | 13165 | 3.9% | |
| _ | 12872 | 3.8% | |
| 1 | 12872 | 3.8% | |
| 6 | 12872 | 3.8% | |
| : | 12872 | 3.8% | |
| t | 12819 | 3.8% | |
| m | 11448 | 3.4% | |
| o | 11294 | 3.4% | |
| s | 10601 | 3.2% | |
| U | 8717 | 2.6% | |
| , | 8243 | 2.5% | |
| S | 8175 | 2.4% | |
| { | 6436 | 1.9% | |
| 3 | 6436 | 1.9% | |
| } | 6436 | 1.9% | |
| d | 5708 | 1.7% | |
| r | 4964 | 1.5% | |
| [ | 4803 | 1.4% | |
| ] | 4803 | 1.4% | |
| A | 4550 | 1.4% | |
| Other values (36) | 17466 | 5.2% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 133878 | 39.9% | |
| Other Punctuation | 72603 | 21.6% | |
| Space Separator | 33793 | 10.1% | |
| Decimal Number | 32180 | 9.6% | |
| Uppercase Letter | 28028 | 8.3% | |
| Connector Punctuation | 12872 | 3.8% | |
| Open Punctuation | 11239 | 3.3% | |
| Close Punctuation | 11239 | 3.3% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| { | 6436 | 57.3% | |
| [ | 4803 | 42.7% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| ' | 51488 | 70.9% | |
| : | 12872 | 17.7% | |
| , | 8243 | 11.4% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 19992 | 14.9% | |
| a | 16820 | 12.6% | |
| i | 16187 | 12.1% | |
| n | 13165 | 9.8% | |
| t | 12819 | 9.6% | |
| m | 11448 | 8.6% | |
| o | 11294 | 8.4% | |
| s | 10601 | 7.9% | |
| d | 5708 | 4.3% | |
| r | 4964 | 3.7% | |
| c | 4385 | 3.3% | |
| f | 3977 | 3.0% | |
| g | 805 | 0.6% | |
| y | 434 | 0.3% | |
| l | 401 | 0.3% | |
| u | 276 | 0.2% | |
| p | 162 | 0.1% | |
| h | 158 | 0.1% | |
| w | 85 | 0.1% | |
| z | 60 | < 0.1% | |
| b | 60 | < 0.1% | |
| x | 41 | < 0.1% | |
| k | 29 | < 0.1% | |
| v | 6 | < 0.1% | |
| j | 1 | < 0.1% |
Most frequent Connector Punctuation characters
| Value | Count | Frequency (%) | |
| _ | 12872 | 100.0% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 1 | 12872 | 40.0% | |
| 6 | 12872 | 40.0% | |
| 3 | 6436 | 20.0% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 33793 | 100.0% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| U | 8717 | 31.1% | |
| S | 8175 | 29.2% | |
| A | 4550 | 16.2% | |
| G | 979 | 3.5% | |
| K | 802 | 2.9% | |
| B | 735 | 2.6% | |
| C | 723 | 2.6% | |
| F | 625 | 2.2% | |
| E | 511 | 1.8% | |
| R | 443 | 1.6% | |
| D | 370 | 1.3% | |
| I | 367 | 1.3% | |
| N | 236 | 0.8% | |
| H | 150 | 0.5% | |
| J | 124 | 0.4% | |
| T | 111 | 0.4% | |
| Z | 103 | 0.4% | |
| M | 87 | 0.3% | |
| P | 87 | 0.3% | |
| L | 60 | 0.2% | |
| O | 32 | 0.1% | |
| X | 30 | 0.1% | |
| W | 6 | < 0.1% | |
| Y | 5 | < 0.1% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| } | 6436 | 57.3% | |
| ] | 4803 | 42.7% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 173926 | 51.8% | |
| Latin | 161906 | 48.2% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| ' | 51488 | 29.6% | |
| 33793 | 19.4% | ||
| _ | 12872 | 7.4% | |
| 1 | 12872 | 7.4% | |
| 6 | 12872 | 7.4% | |
| : | 12872 | 7.4% | |
| , | 8243 | 4.7% | |
| { | 6436 | 3.7% | |
| 3 | 6436 | 3.7% | |
| } | 6436 | 3.7% | |
| [ | 4803 | 2.8% | |
| ] | 4803 | 2.8% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 19992 | 12.3% | |
| a | 16820 | 10.4% | |
| i | 16187 | 10.0% | |
| n | 13165 | 8.1% | |
| t | 12819 | 7.9% | |
| m | 11448 | 7.1% | |
| o | 11294 | 7.0% | |
| s | 10601 | 6.5% | |
| U | 8717 | 5.4% | |
| S | 8175 | 5.0% | |
| d | 5708 | 3.5% | |
| r | 4964 | 3.1% | |
| A | 4550 | 2.8% | |
| c | 4385 | 2.7% | |
| f | 3977 | 2.5% | |
| G | 979 | 0.6% | |
| g | 805 | 0.5% | |
| K | 802 | 0.5% | |
| B | 735 | 0.5% | |
| C | 723 | 0.4% | |
| F | 625 | 0.4% | |
| E | 511 | 0.3% | |
| R | 443 | 0.3% | |
| y | 434 | 0.3% | |
| l | 401 | 0.2% | |
| Other values (24) | 2646 | 1.6% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 335832 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| ' | 51488 | 15.3% | |
| 33793 | 10.1% | ||
| e | 19992 | 6.0% | |
| a | 16820 | 5.0% | |
| i | 16187 | 4.8% | |
| n | 13165 | 3.9% | |
| _ | 12872 | 3.8% | |
| 1 | 12872 | 3.8% | |
| 6 | 12872 | 3.8% | |
| : | 12872 | 3.8% | |
| t | 12819 | 3.8% | |
| m | 11448 | 3.4% | |
| o | 11294 | 3.4% | |
| s | 10601 | 3.2% | |
| U | 8717 | 2.6% | |
| , | 8243 | 2.5% | |
| S | 8175 | 2.4% | |
| { | 6436 | 1.9% | |
| 3 | 6436 | 1.9% | |
| } | 6436 | 1.9% | |
| d | 5708 | 1.7% | |
| r | 4964 | 1.5% | |
| [ | 4803 | 1.4% | |
| ] | 4803 | 1.4% | |
| A | 4550 | 1.4% | |
| Other values (36) | 17466 | 5.2% |
| Distinct | 3281 |
|---|---|
| Distinct (%) | 68.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| 2006-01-01 | 10 |
|---|---|
| 2002-01-01 | 8 |
| 2014-12-25 | 7 |
| 1999-10-22 | 7 |
| 2013-07-18 | 7 |
| Other values (3276) |
| Value | Count | Frequency (%) | |
| 2006-01-01 | 10 | 0.2% | |
| 2002-01-01 | 8 | 0.2% | |
| 2014-12-25 | 7 | 0.1% | |
| 1999-10-22 | 7 | 0.1% | |
| 2013-07-18 | 7 | 0.1% | |
| 2004-09-03 | 7 | 0.1% | |
| 2007-01-01 | 6 | 0.1% | |
| 2011-09-30 | 6 | 0.1% | |
| 2011-09-16 | 6 | 0.1% | |
| 2015-10-16 | 6 | 0.1% | |
| 2005-01-01 | 6 | 0.1% | |
| 2005-09-16 | 6 | 0.1% | |
| 2003-01-01 | 6 | 0.1% | |
| 2010-01-01 | 5 | 0.1% | |
| 2008-01-01 | 5 | 0.1% | |
| 1998-12-25 | 5 | 0.1% | |
| 2001-09-07 | 5 | 0.1% | |
| 1999-10-08 | 5 | 0.1% | |
| 2006-09-01 | 5 | 0.1% | |
| 2014-04-16 | 5 | 0.1% | |
| 2008-10-10 | 5 | 0.1% | |
| 2002-12-13 | 5 | 0.1% | |
| 2005-05-13 | 5 | 0.1% | |
| 2006-08-11 | 5 | 0.1% | |
| 2000-09-08 | 5 | 0.1% | |
| Other values (3256) | 4655 | 96.9% |
Frequencies of value counts
Unique
| Unique | 2265 ? |
|---|---|
| Unique (%) | 47.2% |
Histogram of lengths of the category
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 9.998542578 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 0 | 11926 | 24.8% | |
| - | 9604 | 20.0% | |
| 1 | 7613 | 15.9% | |
| 2 | 6733 | 14.0% | |
| 9 | 3583 | 7.5% | |
| 3 | 1532 | 3.2% | |
| 8 | 1521 | 3.2% | |
| 5 | 1450 | 3.0% | |
| 6 | 1411 | 2.9% | |
| 4 | 1333 | 2.8% | |
| 7 | 1314 | 2.7% | |
| U | 1 | < 0.1% | |
| N | 1 | < 0.1% | |
| K | 1 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 38416 | 80.0% | |
| Dash Punctuation | 9604 | 20.0% | |
| Uppercase Letter | 3 | < 0.1% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 11926 | 31.0% | |
| 1 | 7613 | 19.8% | |
| 2 | 6733 | 17.5% | |
| 9 | 3583 | 9.3% | |
| 3 | 1532 | 4.0% | |
| 8 | 1521 | 4.0% | |
| 5 | 1450 | 3.8% | |
| 6 | 1411 | 3.7% | |
| 4 | 1333 | 3.5% | |
| 7 | 1314 | 3.4% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 9604 | 100.0% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| U | 1 | 33.3% | |
| N | 1 | 33.3% | |
| K | 1 | 33.3% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 48020 | > 99.9% | |
| Latin | 3 | < 0.1% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 0 | 11926 | 24.8% | |
| - | 9604 | 20.0% | |
| 1 | 7613 | 15.9% | |
| 2 | 6733 | 14.0% | |
| 9 | 3583 | 7.5% | |
| 3 | 1532 | 3.2% | |
| 8 | 1521 | 3.2% | |
| 5 | 1450 | 3.0% | |
| 6 | 1411 | 2.9% | |
| 4 | 1333 | 2.8% | |
| 7 | 1314 | 2.7% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| U | 1 | 33.3% | |
| N | 1 | 33.3% | |
| K | 1 | 33.3% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 48023 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 0 | 11926 | 24.8% | |
| - | 9604 | 20.0% | |
| 1 | 7613 | 15.9% | |
| 2 | 6733 | 14.0% | |
| 9 | 3583 | 7.5% | |
| 3 | 1532 | 3.2% | |
| 8 | 1521 | 3.2% | |
| 5 | 1450 | 3.0% | |
| 6 | 1411 | 2.9% | |
| 4 | 1333 | 2.8% | |
| 7 | 1314 | 2.7% | |
| U | 1 | < 0.1% | |
| N | 1 | < 0.1% | |
| K | 1 | < 0.1% |
| Distinct | 3297 |
|---|---|
| Distinct (%) | 68.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 82260638.65 |
|---|---|
| Minimum | 0 |
| Maximum | 2787965087 |
| Zeros | 1427 |
| Zeros (%) | 29.7% |
| Memory size | 37.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 19170001 |
| Q3 | 92917187 |
| 95-th percentile | 369284902.7 |
| Maximum | 2787965087 |
| Range | 2787965087 |
| Interquartile range (IQR) | 92917187 |
Descriptive statistics
| Standard deviation | 162857100.9 |
|---|---|
| Coefficient of variation (CV) | 1.979769469 |
| Kurtosis | 33.12362966 |
| Mean | 82260638.65 |
| Median Absolute Deviation (MAD) | 19170001 |
| Skewness | 4.444716448 |
| Sum | 3.950978474e+11 |
| Variance | 2.652243533e+16 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 1427 | 29.7% | |
| 7000000 | 6 | 0.1% | |
| 8000000 | 6 | 0.1% | |
| 6000000 | 5 | 0.1% | |
| 12000000 | 5 | 0.1% | |
| 10000000 | 5 | 0.1% | |
| 100000000 | 5 | 0.1% | |
| 14000000 | 4 | 0.1% | |
| 25000000 | 4 | 0.1% | |
| 11000000 | 4 | 0.1% | |
| 5000000 | 4 | 0.1% | |
| 32000000 | 3 | 0.1% | |
| 13000000 | 3 | 0.1% | |
| 60000000 | 3 | 0.1% | |
| 7800000 | 3 | 0.1% | |
| 14400000 | 3 | 0.1% | |
| 4000000 | 3 | 0.1% | |
| 17000000 | 3 | 0.1% | |
| 30000000 | 3 | 0.1% | |
| 77000000 | 2 | < 0.1% | |
| 20000000 | 2 | < 0.1% | |
| 29000000 | 2 | < 0.1% | |
| 42000000 | 2 | < 0.1% | |
| 2200000 | 2 | < 0.1% | |
| 8500000 | 2 | < 0.1% | |
| Other values (3272) | 3292 | 68.5% |
| Value | Count | Frequency (%) | |
| 0 | 1427 | 29.7% | |
| 5 | 1 | < 0.1% | |
| 7 | 2 | < 0.1% | |
| 10 | 1 | < 0.1% | |
| 11 | 2 | < 0.1% | |
| 12 | 2 | < 0.1% | |
| 13 | 1 | < 0.1% | |
| 14 | 1 | < 0.1% | |
| 15 | 1 | < 0.1% | |
| 16 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2787965087 | 1 | < 0.1% | |
| 1845034188 | 1 | < 0.1% | |
| 1519557910 | 1 | < 0.1% | |
| 1513528810 | 1 | < 0.1% | |
| 1506249360 | 1 | < 0.1% | |
| 1405403694 | 1 | < 0.1% | |
| 1274219009 | 1 | < 0.1% | |
| 1215439994 | 1 | < 0.1% | |
| 1156730962 | 1 | < 0.1% | |
| 1153304495 | 1 | < 0.1% |
| Distinct | 544 |
|---|---|
| Distinct (%) | 11.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| [{'iso_639_1': 'en', 'name': 'English'}] | |
|---|---|
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}] | 127 |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}] | 114 |
| [] | 86 |
| [{'iso_639_1': 'es', 'name': 'Español'}, {'iso_639_1': 'en', 'name': 'English'}] | 54 |
| Other values (539) |
| Value | Count | Frequency (%) | |
| [{'iso_639_1': 'en', 'name': 'English'}] | 3171 | 66.0% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}] | 127 | 2.6% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}] | 114 | 2.4% | |
| [] | 86 | 1.8% | |
| [{'iso_639_1': 'es', 'name': 'Español'}, {'iso_639_1': 'en', 'name': 'English'}] | 54 | 1.1% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'de', 'name': 'Deutsch'}] | 53 | 1.1% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'it', 'name': 'Italiano'}] | 51 | 1.1% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'ru', 'name': 'Pусский'}] | 50 | 1.0% | |
| [{'iso_639_1': 'fr', 'name': 'Français'}] | 49 | 1.0% | |
| [{'iso_639_1': 'es', 'name': 'Español'}] | 23 | 0.5% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'ja', 'name': '日本語'}] | 23 | 0.5% | |
| [{'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'en', 'name': 'English'}] | 23 | 0.5% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'pt', 'name': 'Português'}] | 22 | 0.5% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'pl', 'name': 'Polski'}] | 22 | 0.5% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'zh', 'name': '普通话'}] | 19 | 0.4% | |
| [{'iso_639_1': 'it', 'name': 'Italiano'}, {'iso_639_1': 'en', 'name': 'English'}] | 17 | 0.4% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'th', 'name': 'ภาษาไทย'}] | 15 | 0.3% | |
| [{'iso_639_1': 'hi', 'name': 'हिन्दी'}] | 15 | 0.3% | |
| [{'iso_639_1': 'cs', 'name': 'Český'}, {'iso_639_1': 'en', 'name': 'English'}] | 14 | 0.3% | |
| [{'iso_639_1': 'de', 'name': 'Deutsch'}] | 14 | 0.3% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'de', 'name': 'Deutsch'}] | 14 | 0.3% | |
| [{'iso_639_1': 'zh', 'name': '普通话'}] | 13 | 0.3% | |
| [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'it', 'name': 'Italiano'}] | 12 | 0.2% | |
| [{'iso_639_1': 'ru', 'name': 'Pусский'}] | 12 | 0.2% | |
| [{'iso_639_1': 'de', 'name': 'Deutsch'}, {'iso_639_1': 'en', 'name': 'English'}] | 12 | 0.2% | |
| Other values (519) | 778 | 16.2% |
Frequencies of value counts
Unique
| Unique | 416 ? |
|---|---|
| Unique (%) | 8.7% |
Histogram of lengths of the category
Length
| Max length | 350 |
|---|---|
| Median length | 40 |
| Mean length | 57.68394753 |
| Min length | 2 |
Most occurring characters
| Value | Count | Frequency (%) | |
| ' | 55496 | 20.0% | |
| 23151 | 8.4% | ||
| n | 16768 | 6.1% | |
| _ | 13874 | 5.0% | |
| : | 13874 | 5.0% | |
| s | 13221 | 4.8% | |
| i | 12537 | 4.5% | |
| e | 12486 | 4.5% | |
| , | 9157 | 3.3% | |
| a | 9055 | 3.3% | |
| o | 7712 | 2.8% | |
| m | 6969 | 2.5% | |
| { | 6937 | 2.5% | |
| 6 | 6937 | 2.5% | |
| 3 | 6937 | 2.5% | |
| 9 | 6937 | 2.5% | |
| 1 | 6937 | 2.5% | |
| } | 6937 | 2.5% | |
| l | 5259 | 1.9% | |
| h | 5047 | 1.8% | |
| E | 4840 | 1.7% | |
| [ | 4803 | 1.7% | |
| ] | 4803 | 1.7% | |
| g | 4641 | 1.7% | |
| r | 1355 | 0.5% | |
| Other values (152) | 10386 | 3.7% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 101285 | 36.6% | |
| Other Punctuation | 78612 | 28.4% | |
| Decimal Number | 27748 | 10.0% | |
| Space Separator | 23151 | 8.4% | |
| Connector Punctuation | 13874 | 5.0% | |
| Open Punctuation | 11740 | 4.2% | |
| Close Punctuation | 11740 | 4.2% | |
| Uppercase Letter | 6336 | 2.3% | |
| Other Letter | 2301 | 0.8% | |
| Nonspacing Mark | 156 | 0.1% | |
| Spacing Mark | 113 | < 0.1% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| { | 6937 | 59.1% | |
| [ | 4803 | 40.9% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| ' | 55496 | 70.6% | |
| : | 13874 | 17.6% | |
| , | 9157 | 11.6% | |
| / | 79 | 0.1% | |
| ? | 6 | < 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| n | 16768 | 16.6% | |
| s | 13221 | 13.1% | |
| i | 12537 | 12.4% | |
| e | 12486 | 12.3% | |
| a | 9055 | 8.9% | |
| o | 7712 | 7.6% | |
| m | 6969 | 6.9% | |
| l | 5259 | 5.2% | |
| h | 5047 | 5.0% | |
| g | 4641 | 4.6% | |
| r | 1355 | 1.3% | |
| t | 920 | 0.9% | |
| u | 668 | 0.7% | |
| p | 490 | 0.5% | |
| f | 467 | 0.5% | |
| ç | 455 | 0.4% | |
| с | 382 | 0.4% | |
| c | 353 | 0.3% | |
| ñ | 351 | 0.3% | |
| d | 307 | 0.3% | |
| k | 235 | 0.2% | |
| к | 209 | 0.2% | |
| и | 200 | 0.2% | |
| й | 194 | 0.2% | |
| у | 185 | 0.2% | |
| Other values (44) | 819 | 0.8% |
Most frequent Connector Punctuation characters
| Value | Count | Frequency (%) | |
| _ | 13874 | 100.0% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 6 | 6937 | 25.0% | |
| 3 | 6937 | 25.0% | |
| 9 | 6937 | 25.0% | |
| 1 | 6937 | 25.0% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 23151 | 100.0% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| E | 4840 | 76.4% | |
| F | 437 | 6.9% | |
| P | 306 | 4.8% | |
| D | 276 | 4.4% | |
| I | 188 | 3.0% | |
| L | 54 | 0.9% | |
| M | 42 | 0.7% | |
| Č | 38 | 0.6% | |
| T | 35 | 0.6% | |
| N | 25 | 0.4% | |
| V | 17 | 0.3% | |
| R | 13 | 0.2% | |
| S | 12 | 0.2% | |
| У | 9 | 0.1% | |
| K | 8 | 0.1% | |
| A | 7 | 0.1% | |
| G | 7 | 0.1% | |
| Í | 5 | 0.1% | |
| B | 5 | 0.1% | |
| H | 4 | 0.1% | |
| Z | 4 | 0.1% | |
| C | 3 | < 0.1% | |
| W | 1 | < 0.1% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| } | 6937 | 59.1% | |
| ] | 4803 | 40.9% |
Most frequent Other Letter characters
| Value | Count | Frequency (%) | |
| 话 | 155 | 6.7% | |
| 普 | 107 | 4.7% | |
| 通 | 107 | 4.7% | |
| 日 | 97 | 4.2% | |
| 本 | 97 | 4.2% | |
| 語 | 97 | 4.2% | |
| 州 | 96 | 4.2% | |
| ا | 94 | 4.1% | |
| ر | 94 | 4.1% | |
| า | 80 | 3.5% | |
| ل | 67 | 2.9% | |
| ع | 67 | 2.9% | |
| ب | 67 | 2.9% | |
| ي | 67 | 2.9% | |
| ة | 67 | 2.9% | |
| ह | 48 | 2.1% | |
| न | 48 | 2.1% | |
| द | 48 | 2.1% | |
| 广 | 48 | 2.1% | |
| 廣 | 48 | 2.1% | |
| 話 | 48 | 2.1% | |
| ภ | 40 | 1.7% | |
| ษ | 40 | 1.7% | |
| ไ | 40 | 1.7% | |
| ท | 40 | 1.7% | |
| Other values (31) | 494 | 21.5% |
Most frequent Spacing Mark characters
| Value | Count | Frequency (%) | |
| ि | 48 | 42.5% | |
| ी | 48 | 42.5% | |
| ி | 4 | 3.5% | |
| ਾ | 4 | 3.5% | |
| ੀ | 4 | 3.5% | |
| ు | 2 | 1.8% | |
| া | 2 | 1.8% | |
| ং | 1 | 0.9% |
Most frequent Nonspacing Mark characters
| Value | Count | Frequency (%) | |
| ִ | 66 | 42.3% | |
| ् | 48 | 30.8% | |
| ְ | 33 | 21.2% | |
| ் | 4 | 2.6% | |
| ੰ | 4 | 2.6% | |
| ె | 1 | 0.6% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 166865 | 60.2% | |
| Latin | 106196 | 38.3% | |
| Cyrillic | 1258 | 0.5% | |
| Han | 900 | 0.3% | |
| Arabic | 597 | 0.2% | |
| Devanagari | 288 | 0.1% | |
| Thai | 280 | 0.1% | |
| Hebrew | 264 | 0.1% | |
| Hangul | 186 | 0.1% | |
| Greek | 160 | 0.1% | |
| Gurmukhi | 24 | < 0.1% | |
| Tamil | 20 | < 0.1% | |
| Georgian | 7 | < 0.1% | |
| Telugu | 6 | < 0.1% | |
| Bengali | 5 | < 0.1% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| ' | 55496 | 33.3% | |
| 23151 | 13.9% | ||
| _ | 13874 | 8.3% | |
| : | 13874 | 8.3% | |
| , | 9157 | 5.5% | |
| { | 6937 | 4.2% | |
| 6 | 6937 | 4.2% | |
| 3 | 6937 | 4.2% | |
| 9 | 6937 | 4.2% | |
| 1 | 6937 | 4.2% | |
| } | 6937 | 4.2% | |
| [ | 4803 | 2.9% | |
| ] | 4803 | 2.9% | |
| / | 79 | < 0.1% | |
| ? | 6 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| n | 16768 | 15.8% | |
| s | 13221 | 12.4% | |
| i | 12537 | 11.8% | |
| e | 12486 | 11.8% | |
| a | 9055 | 8.5% | |
| o | 7712 | 7.3% | |
| m | 6969 | 6.6% | |
| l | 5259 | 5.0% | |
| h | 5047 | 4.8% | |
| E | 4840 | 4.6% | |
| g | 4641 | 4.4% | |
| r | 1355 | 1.3% | |
| t | 920 | 0.9% | |
| u | 668 | 0.6% | |
| p | 490 | 0.5% | |
| f | 467 | 0.4% | |
| ç | 455 | 0.4% | |
| F | 437 | 0.4% | |
| c | 353 | 0.3% | |
| ñ | 351 | 0.3% | |
| d | 307 | 0.3% | |
| P | 306 | 0.3% | |
| D | 276 | 0.3% | |
| k | 235 | 0.2% | |
| I | 188 | 0.2% | |
| Other values (35) | 853 | 0.8% |
Most frequent Greek characters
| Value | Count | Frequency (%) | |
| λ | 40 | 25.0% | |
| ε | 20 | 12.5% | |
| η | 20 | 12.5% | |
| ν | 20 | 12.5% | |
| ι | 20 | 12.5% | |
| κ | 20 | 12.5% | |
| ά | 20 | 12.5% |
Most frequent Han characters
| Value | Count | Frequency (%) | |
| 话 | 155 | 17.2% | |
| 普 | 107 | 11.9% | |
| 通 | 107 | 11.9% | |
| 日 | 97 | 10.8% | |
| 本 | 97 | 10.8% | |
| 語 | 97 | 10.8% | |
| 州 | 96 | 10.7% | |
| 广 | 48 | 5.3% | |
| 廣 | 48 | 5.3% | |
| 話 | 48 | 5.3% |
Most frequent Thai characters
| Value | Count | Frequency (%) | |
| า | 80 | 28.6% | |
| ภ | 40 | 14.3% | |
| ษ | 40 | 14.3% | |
| ไ | 40 | 14.3% | |
| ท | 40 | 14.3% | |
| ย | 40 | 14.3% |
Most frequent Cyrillic characters
| Value | Count | Frequency (%) | |
| с | 382 | 30.4% | |
| к | 209 | 16.6% | |
| и | 200 | 15.9% | |
| й | 194 | 15.4% | |
| у | 185 | 14.7% | |
| а | 16 | 1.3% | |
| р | 12 | 1.0% | |
| У | 9 | 0.7% | |
| ї | 9 | 0.7% | |
| н | 9 | 0.7% | |
| ь | 9 | 0.7% | |
| з | 5 | 0.4% | |
| қ | 4 | 0.3% | |
| б | 3 | 0.2% | |
| ъ | 3 | 0.2% | |
| л | 3 | 0.2% | |
| г | 3 | 0.2% | |
| е | 3 | 0.2% |
Most frequent Devanagari characters
| Value | Count | Frequency (%) | |
| ह | 48 | 16.7% | |
| ि | 48 | 16.7% | |
| न | 48 | 16.7% | |
| ् | 48 | 16.7% | |
| द | 48 | 16.7% | |
| ी | 48 | 16.7% |
Most frequent Arabic characters
| Value | Count | Frequency (%) | |
| ا | 94 | 15.7% | |
| ر | 94 | 15.7% | |
| ل | 67 | 11.2% | |
| ع | 67 | 11.2% | |
| ب | 67 | 11.2% | |
| ي | 67 | 11.2% | |
| ة | 67 | 11.2% | |
| و | 17 | 2.8% | |
| د | 15 | 2.5% | |
| ف | 12 | 2.0% | |
| س | 12 | 2.0% | |
| ی | 12 | 2.0% | |
| پ | 2 | 0.3% | |
| ښ | 2 | 0.3% | |
| ت | 2 | 0.3% |
Most frequent Hangul characters
| Value | Count | Frequency (%) | |
| 한 | 31 | 16.7% | |
| 국 | 31 | 16.7% | |
| 어 | 31 | 16.7% | |
| 조 | 31 | 16.7% | |
| 선 | 31 | 16.7% | |
| 말 | 31 | 16.7% |
Most frequent Tamil characters
| Value | Count | Frequency (%) | |
| த | 4 | 20.0% | |
| ம | 4 | 20.0% | |
| ி | 4 | 20.0% | |
| ழ | 4 | 20.0% | |
| ் | 4 | 20.0% |
Most frequent Hebrew characters
| Value | Count | Frequency (%) | |
| ִ | 66 | 25.0% | |
| ע | 33 | 12.5% | |
| ב | 33 | 12.5% | |
| ְ | 33 | 12.5% | |
| ר | 33 | 12.5% | |
| י | 33 | 12.5% | |
| ת | 33 | 12.5% |
Most frequent Gurmukhi characters
| Value | Count | Frequency (%) | |
| ਪ | 4 | 16.7% | |
| ੰ | 4 | 16.7% | |
| ਜ | 4 | 16.7% | |
| ਾ | 4 | 16.7% | |
| ਬ | 4 | 16.7% | |
| ੀ | 4 | 16.7% |
Most frequent Telugu characters
| Value | Count | Frequency (%) | |
| ు | 2 | 33.3% | |
| త | 1 | 16.7% | |
| ె | 1 | 16.7% | |
| ల | 1 | 16.7% | |
| గ | 1 | 16.7% |
Most frequent Georgian characters
| Value | Count | Frequency (%) | |
| ქ | 1 | 14.3% | |
| ა | 1 | 14.3% | |
| რ | 1 | 14.3% | |
| თ | 1 | 14.3% | |
| უ | 1 | 14.3% | |
| ლ | 1 | 14.3% | |
| ი | 1 | 14.3% |
Most frequent Bengali characters
| Value | Count | Frequency (%) | |
| া | 2 | 40.0% | |
| ব | 1 | 20.0% | |
| ং | 1 | 20.0% | |
| ল | 1 | 20.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 272023 | 98.2% | |
| Cyrillic | 1258 | 0.5% | |
| None | 1164 | 0.4% | |
| CJK | 900 | 0.3% | |
| Arabic | 597 | 0.2% | |
| Devanagari | 288 | 0.1% | |
| Thai | 280 | 0.1% | |
| Hebrew | 264 | 0.1% | |
| Hangul | 186 | 0.1% | |
| Latin Ext Additional | 34 | < 0.1% | |
| Gurmukhi | 24 | < 0.1% | |
| Tamil | 20 | < 0.1% | |
| Georgian | 7 | < 0.1% | |
| Telugu | 6 | < 0.1% | |
| Bengali | 5 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| ' | 55496 | 20.4% | |
| 23151 | 8.5% | ||
| n | 16768 | 6.2% | |
| _ | 13874 | 5.1% | |
| : | 13874 | 5.1% | |
| s | 13221 | 4.9% | |
| i | 12537 | 4.6% | |
| e | 12486 | 4.6% | |
| , | 9157 | 3.4% | |
| a | 9055 | 3.3% | |
| o | 7712 | 2.8% | |
| m | 6969 | 2.6% | |
| { | 6937 | 2.6% | |
| 6 | 6937 | 2.6% | |
| 3 | 6937 | 2.6% | |
| 9 | 6937 | 2.6% | |
| 1 | 6937 | 2.6% | |
| } | 6937 | 2.6% | |
| l | 5259 | 1.9% | |
| h | 5047 | 1.9% | |
| E | 4840 | 1.8% | |
| [ | 4803 | 1.8% | |
| ] | 4803 | 1.8% | |
| g | 4641 | 1.7% | |
| r | 1355 | 0.5% | |
| Other values (36) | 5353 | 2.0% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| ç | 455 | 39.1% | |
| ñ | 351 | 30.2% | |
| ê | 68 | 5.8% | |
| λ | 40 | 3.4% | |
| Č | 38 | 3.3% | |
| ý | 38 | 3.3% | |
| ε | 20 | 1.7% | |
| η | 20 | 1.7% | |
| ν | 20 | 1.7% | |
| ι | 20 | 1.7% | |
| κ | 20 | 1.7% | |
| ά | 20 | 1.7% | |
| ü | 18 | 1.5% | |
| â | 13 | 1.1% | |
| ă | 13 | 1.1% | |
| Í | 5 | 0.4% | |
| č | 3 | 0.3% | |
| à | 1 | 0.1% | |
| š | 1 | 0.1% |
Most frequent CJK characters
| Value | Count | Frequency (%) | |
| 话 | 155 | 17.2% | |
| 普 | 107 | 11.9% | |
| 通 | 107 | 11.9% | |
| 日 | 97 | 10.8% | |
| 本 | 97 | 10.8% | |
| 語 | 97 | 10.8% | |
| 州 | 96 | 10.7% | |
| 广 | 48 | 5.3% | |
| 廣 | 48 | 5.3% | |
| 話 | 48 | 5.3% |
Most frequent Thai characters
| Value | Count | Frequency (%) | |
| า | 80 | 28.6% | |
| ภ | 40 | 14.3% | |
| ษ | 40 | 14.3% | |
| ไ | 40 | 14.3% | |
| ท | 40 | 14.3% | |
| ย | 40 | 14.3% |
Most frequent Cyrillic characters
| Value | Count | Frequency (%) | |
| с | 382 | 30.4% | |
| к | 209 | 16.6% | |
| и | 200 | 15.9% | |
| й | 194 | 15.4% | |
| у | 185 | 14.7% | |
| а | 16 | 1.3% | |
| р | 12 | 1.0% | |
| У | 9 | 0.7% | |
| ї | 9 | 0.7% | |
| н | 9 | 0.7% | |
| ь | 9 | 0.7% | |
| з | 5 | 0.4% | |
| қ | 4 | 0.3% | |
| б | 3 | 0.2% | |
| ъ | 3 | 0.2% | |
| л | 3 | 0.2% | |
| г | 3 | 0.2% | |
| е | 3 | 0.2% |
Most frequent Devanagari characters
| Value | Count | Frequency (%) | |
| ह | 48 | 16.7% | |
| ि | 48 | 16.7% | |
| न | 48 | 16.7% | |
| ् | 48 | 16.7% | |
| द | 48 | 16.7% | |
| ी | 48 | 16.7% |
Most frequent Arabic characters
| Value | Count | Frequency (%) | |
| ا | 94 | 15.7% | |
| ر | 94 | 15.7% | |
| ل | 67 | 11.2% | |
| ع | 67 | 11.2% | |
| ب | 67 | 11.2% | |
| ي | 67 | 11.2% | |
| ة | 67 | 11.2% | |
| و | 17 | 2.8% | |
| د | 15 | 2.5% | |
| ف | 12 | 2.0% | |
| س | 12 | 2.0% | |
| ی | 12 | 2.0% | |
| پ | 2 | 0.3% | |
| ښ | 2 | 0.3% | |
| ت | 2 | 0.3% |
Most frequent Hangul characters
| Value | Count | Frequency (%) | |
| 한 | 31 | 16.7% | |
| 국 | 31 | 16.7% | |
| 어 | 31 | 16.7% | |
| 조 | 31 | 16.7% | |
| 선 | 31 | 16.7% | |
| 말 | 31 | 16.7% |
Most frequent Tamil characters
| Value | Count | Frequency (%) | |
| த | 4 | 20.0% | |
| ம | 4 | 20.0% | |
| ி | 4 | 20.0% | |
| ழ | 4 | 20.0% | |
| ் | 4 | 20.0% |
Most frequent Hebrew characters
| Value | Count | Frequency (%) | |
| ִ | 66 | 25.0% | |
| ע | 33 | 12.5% | |
| ב | 33 | 12.5% | |
| ְ | 33 | 12.5% | |
| ר | 33 | 12.5% | |
| י | 33 | 12.5% | |
| ת | 33 | 12.5% |
Most frequent Latin Ext Additional characters
| Value | Count | Frequency (%) | |
| ế | 17 | 50.0% | |
| ệ | 17 | 50.0% |
Most frequent Gurmukhi characters
| Value | Count | Frequency (%) | |
| ਪ | 4 | 16.7% | |
| ੰ | 4 | 16.7% | |
| ਜ | 4 | 16.7% | |
| ਾ | 4 | 16.7% | |
| ਬ | 4 | 16.7% | |
| ੀ | 4 | 16.7% |
Most frequent Telugu characters
| Value | Count | Frequency (%) | |
| ు | 2 | 33.3% | |
| త | 1 | 16.7% | |
| ె | 1 | 16.7% | |
| ల | 1 | 16.7% | |
| గ | 1 | 16.7% |
Most frequent Georgian characters
| Value | Count | Frequency (%) | |
| ქ | 1 | 14.3% | |
| ა | 1 | 14.3% | |
| რ | 1 | 14.3% | |
| თ | 1 | 14.3% | |
| უ | 1 | 14.3% | |
| ლ | 1 | 14.3% | |
| ი | 1 | 14.3% |
Most frequent Bengali characters
| Value | Count | Frequency (%) | |
| া | 2 | 40.0% | |
| ব | 1 | 20.0% | |
| ং | 1 | 20.0% | |
| ল | 1 | 20.0% |
status
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| Released | |
|---|---|
| Rumored | 5 |
| Post Production | 3 |
| Value | Count | Frequency (%) | |
| Released | 4795 | 99.8% | |
| Rumored | 5 | 0.1% | |
| Post Production | 3 | 0.1% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 15 |
|---|---|
| Median length | 8 |
| Mean length | 8.003331251 |
| Min length | 7 |
Most occurring characters
| Value | Count | Frequency (%) | |
| e | 14390 | 37.4% | |
| d | 4803 | 12.5% | |
| R | 4800 | 12.5% | |
| s | 4798 | 12.5% | |
| l | 4795 | 12.5% | |
| a | 4795 | 12.5% | |
| o | 14 | < 0.1% | |
| r | 8 | < 0.1% | |
| u | 8 | < 0.1% | |
| P | 6 | < 0.1% | |
| t | 6 | < 0.1% | |
| m | 5 | < 0.1% | |
| 3 | < 0.1% | ||
| c | 3 | < 0.1% | |
| i | 3 | < 0.1% | |
| n | 3 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 33631 | 87.5% | |
| Uppercase Letter | 4806 | 12.5% | |
| Space Separator | 3 | < 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| R | 4800 | 99.9% | |
| P | 6 | 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 14390 | 42.8% | |
| d | 4803 | 14.3% | |
| s | 4798 | 14.3% | |
| l | 4795 | 14.3% | |
| a | 4795 | 14.3% | |
| o | 14 | < 0.1% | |
| r | 8 | < 0.1% | |
| u | 8 | < 0.1% | |
| t | 6 | < 0.1% | |
| m | 5 | < 0.1% | |
| c | 3 | < 0.1% | |
| i | 3 | < 0.1% | |
| n | 3 | < 0.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 3 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 38437 | > 99.9% | |
| Common | 3 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 14390 | 37.4% | |
| d | 4803 | 12.5% | |
| R | 4800 | 12.5% | |
| s | 4798 | 12.5% | |
| l | 4795 | 12.5% | |
| a | 4795 | 12.5% | |
| o | 14 | < 0.1% | |
| r | 8 | < 0.1% | |
| u | 8 | < 0.1% | |
| P | 6 | < 0.1% | |
| t | 6 | < 0.1% | |
| m | 5 | < 0.1% | |
| c | 3 | < 0.1% | |
| i | 3 | < 0.1% | |
| n | 3 | < 0.1% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 3 | 100.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 38440 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| e | 14390 | 37.4% | |
| d | 4803 | 12.5% | |
| R | 4800 | 12.5% | |
| s | 4798 | 12.5% | |
| l | 4795 | 12.5% | |
| a | 4795 | 12.5% | |
| o | 14 | < 0.1% | |
| r | 8 | < 0.1% | |
| u | 8 | < 0.1% | |
| P | 6 | < 0.1% | |
| t | 6 | < 0.1% | |
| m | 5 | < 0.1% | |
| 3 | < 0.1% | ||
| c | 3 | < 0.1% | |
| i | 3 | < 0.1% | |
| n | 3 | < 0.1% |
| Distinct | 3945 |
|---|---|
| Distinct (%) | 82.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| UNK | |
|---|---|
| Based on a true story. | 3 |
| There are two sides to every love story. | 2 |
| The only way out is down. | 2 |
| From zero to hero. | 2 |
| Other values (3940) |
| Value | Count | Frequency (%) | |
| UNK | 844 | 17.6% | |
| Based on a true story. | 3 | 0.1% | |
| There are two sides to every love story. | 2 | < 0.1% | |
| The only way out is down. | 2 | < 0.1% | |
| From zero to hero. | 2 | < 0.1% | |
| You never forget your first love. | 2 | < 0.1% | |
| Who's next? | 2 | < 0.1% | |
| Who is John Galt? | 2 | < 0.1% | |
| Worlds Collide | 2 | < 0.1% | |
| One ordinary couple. One little white lie. | 2 | < 0.1% | |
| Be careful what you wish for. | 2 | < 0.1% | |
| Based on the incredible true story. | 2 | < 0.1% | |
| There are no clean getaways. | 2 | < 0.1% | |
| One way in. No way out. | 2 | < 0.1% | |
| What could go wrong? | 2 | < 0.1% | |
| The ball is back! | 1 | < 0.1% | |
| One person can change your life forever | 1 | < 0.1% | |
| Underworld | 1 | < 0.1% | |
| Believe The Unbelievable | 1 | < 0.1% | |
| Suffering? You Haven't Seen Anything Yet... | 1 | < 0.1% | |
| The Most Seductive Evil of All Time Has Now Been Unleashed in Ours. | 1 | < 0.1% | |
| Gangway...For This Years BIG Adventure! | 1 | < 0.1% | |
| Be careful what you wish for... | 1 | < 0.1% | |
| What would you go back for? | 1 | < 0.1% | |
| She's the one in every family. | 1 | < 0.1% | |
| Other values (3920) | 3920 | 81.6% |
Frequencies of value counts
Unique
| Unique | 3930 ? |
|---|---|
| Unique (%) | 81.8% |
Histogram of lengths of the category
Length
| Max length | 252 |
|---|---|
| Median length | 32 |
| Mean length | 35.13762232 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 26926 | 16.0% | ||
| e | 17450 | 10.3% | |
| o | 10463 | 6.2% | |
| t | 10358 | 6.1% | |
| a | 8761 | 5.2% | |
| n | 8406 | 5.0% | |
| i | 8137 | 4.8% | |
| r | 7933 | 4.7% | |
| s | 7648 | 4.5% | |
| h | 6587 | 3.9% | |
| . | 5147 | 3.0% | |
| l | 5132 | 3.0% | |
| d | 3951 | 2.3% | |
| u | 3669 | 2.2% | |
| y | 3115 | 1.8% | |
| m | 3089 | 1.8% | |
| g | 2734 | 1.6% | |
| c | 2660 | 1.6% | |
| f | 2446 | 1.4% | |
| w | 2252 | 1.3% | |
| v | 1883 | 1.1% | |
| p | 1572 | 0.9% | |
| b | 1568 | 0.9% | |
| T | 1492 | 0.9% | |
| N | 1274 | 0.8% | |
| Other values (67) | 14113 | 8.4% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 121321 | 71.9% | |
| Space Separator | 26926 | 16.0% | |
| Uppercase Letter | 12232 | 7.2% | |
| Other Punctuation | 7563 | 4.5% | |
| Decimal Number | 521 | 0.3% | |
| Dash Punctuation | 150 | 0.1% | |
| Final Punctuation | 28 | < 0.1% | |
| Open Punctuation | 8 | < 0.1% | |
| Close Punctuation | 8 | < 0.1% | |
| Other Letter | 5 | < 0.1% | |
| Currency Symbol | 4 | < 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| T | 1492 | 12.2% | |
| N | 1274 | 10.4% | |
| U | 937 | 7.7% | |
| K | 931 | 7.6% | |
| A | 827 | 6.8% | |
| S | 708 | 5.8% | |
| I | 599 | 4.9% | |
| W | 592 | 4.8% | |
| H | 585 | 4.8% | |
| B | 478 | 3.9% | |
| F | 419 | 3.4% | |
| E | 411 | 3.4% | |
| L | 400 | 3.3% | |
| O | 377 | 3.1% | |
| C | 354 | 2.9% | |
| D | 332 | 2.7% | |
| M | 332 | 2.7% | |
| Y | 297 | 2.4% | |
| R | 241 | 2.0% | |
| G | 238 | 1.9% | |
| P | 234 | 1.9% | |
| J | 104 | 0.9% | |
| V | 50 | 0.4% | |
| Z | 11 | 0.1% | |
| Q | 6 | < 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 17450 | 14.4% | |
| o | 10463 | 8.6% | |
| t | 10358 | 8.5% | |
| a | 8761 | 7.2% | |
| n | 8406 | 6.9% | |
| i | 8137 | 6.7% | |
| r | 7933 | 6.5% | |
| s | 7648 | 6.3% | |
| h | 6587 | 5.4% | |
| l | 5132 | 4.2% | |
| d | 3951 | 3.3% | |
| u | 3669 | 3.0% | |
| y | 3115 | 2.6% | |
| m | 3089 | 2.5% | |
| g | 2734 | 2.3% | |
| c | 2660 | 2.2% | |
| f | 2446 | 2.0% | |
| w | 2252 | 1.9% | |
| v | 1883 | 1.6% | |
| p | 1572 | 1.3% | |
| b | 1568 | 1.3% | |
| k | 1025 | 0.8% | |
| x | 195 | 0.2% | |
| j | 161 | 0.1% | |
| z | 71 | 0.1% | |
| Other values (3) | 55 | < 0.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 26926 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 5147 | 68.1% | |
| ' | 1039 | 13.7% | |
| , | 725 | 9.6% | |
| ! | 356 | 4.7% | |
| ? | 220 | 2.9% | |
| " | 20 | 0.3% | |
| : | 14 | 0.2% | |
| … | 10 | 0.1% | |
| & | 9 | 0.1% | |
| % | 9 | 0.1% | |
| ; | 5 | 0.1% | |
| # | 5 | 0.1% | |
| * | 3 | < 0.1% | |
| / | 1 | < 0.1% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 149 | 99.3% | |
| – | 1 | 0.7% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 162 | 31.1% | |
| 1 | 99 | 19.0% | |
| 2 | 55 | 10.6% | |
| 9 | 38 | 7.3% | |
| 3 | 37 | 7.1% | |
| 7 | 28 | 5.4% | |
| 5 | 27 | 5.2% | |
| 4 | 26 | 5.0% | |
| 6 | 25 | 4.8% | |
| 8 | 24 | 4.6% |
Most frequent Final Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 28 | 100.0% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| ( | 7 | 87.5% | |
| [ | 1 | 12.5% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| ) | 7 | 87.5% | |
| ] | 1 | 12.5% |
Most frequent Currency Symbol characters
| Value | Count | Frequency (%) | |
| $ | 4 | 100.0% |
Most frequent Other Letter characters
| Value | Count | Frequency (%) | |
| 最 | 1 | 20.0% | |
| 后 | 1 | 20.0% | |
| 的 | 1 | 20.0% | |
| 舞 | 1 | 20.0% | |
| 者 | 1 | 20.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 133553 | 79.1% | |
| Common | 35208 | 20.9% | |
| Han | 5 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 17450 | 13.1% | |
| o | 10463 | 7.8% | |
| t | 10358 | 7.8% | |
| a | 8761 | 6.6% | |
| n | 8406 | 6.3% | |
| i | 8137 | 6.1% | |
| r | 7933 | 5.9% | |
| s | 7648 | 5.7% | |
| h | 6587 | 4.9% | |
| l | 5132 | 3.8% | |
| d | 3951 | 3.0% | |
| u | 3669 | 2.7% | |
| y | 3115 | 2.3% | |
| m | 3089 | 2.3% | |
| g | 2734 | 2.0% | |
| c | 2660 | 2.0% | |
| f | 2446 | 1.8% | |
| w | 2252 | 1.7% | |
| v | 1883 | 1.4% | |
| p | 1572 | 1.2% | |
| b | 1568 | 1.2% | |
| T | 1492 | 1.1% | |
| N | 1274 | 1.0% | |
| k | 1025 | 0.8% | |
| U | 937 | 0.7% | |
| Other values (29) | 9011 | 6.7% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 26926 | 76.5% | ||
| . | 5147 | 14.6% | |
| ' | 1039 | 3.0% | |
| , | 725 | 2.1% | |
| ! | 356 | 1.0% | |
| ? | 220 | 0.6% | |
| 0 | 162 | 0.5% | |
| - | 149 | 0.4% | |
| 1 | 99 | 0.3% | |
| 2 | 55 | 0.2% | |
| 9 | 38 | 0.1% | |
| 3 | 37 | 0.1% | |
| ’ | 28 | 0.1% | |
| 7 | 28 | 0.1% | |
| 5 | 27 | 0.1% | |
| 4 | 26 | 0.1% | |
| 6 | 25 | 0.1% | |
| 8 | 24 | 0.1% | |
| " | 20 | 0.1% | |
| : | 14 | < 0.1% | |
| … | 10 | < 0.1% | |
| & | 9 | < 0.1% | |
| % | 9 | < 0.1% | |
| ( | 7 | < 0.1% | |
| ) | 7 | < 0.1% | |
| Other values (8) | 21 | 0.1% |
Most frequent Han characters
| Value | Count | Frequency (%) | |
| 最 | 1 | 20.0% | |
| 后 | 1 | 20.0% | |
| 的 | 1 | 20.0% | |
| 舞 | 1 | 20.0% | |
| 者 | 1 | 20.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 168720 | > 99.9% | |
| Punctuation | 39 | < 0.1% | |
| CJK | 5 | < 0.1% | |
| None | 2 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 26926 | 16.0% | ||
| e | 17450 | 10.3% | |
| o | 10463 | 6.2% | |
| t | 10358 | 6.1% | |
| a | 8761 | 5.2% | |
| n | 8406 | 5.0% | |
| i | 8137 | 4.8% | |
| r | 7933 | 4.7% | |
| s | 7648 | 4.5% | |
| h | 6587 | 3.9% | |
| . | 5147 | 3.1% | |
| l | 5132 | 3.0% | |
| d | 3951 | 2.3% | |
| u | 3669 | 2.2% | |
| y | 3115 | 1.8% | |
| m | 3089 | 1.8% | |
| g | 2734 | 1.6% | |
| c | 2660 | 1.6% | |
| f | 2446 | 1.4% | |
| w | 2252 | 1.3% | |
| v | 1883 | 1.1% | |
| p | 1572 | 0.9% | |
| b | 1568 | 0.9% | |
| T | 1492 | 0.9% | |
| N | 1274 | 0.8% | |
| Other values (57) | 14067 | 8.3% |
Most frequent Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 28 | 71.8% | |
| … | 10 | 25.6% | |
| – | 1 | 2.6% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| á | 1 | 50.0% | |
| é | 1 | 50.0% |
Most frequent CJK characters
| Value | Count | Frequency (%) | |
| 最 | 1 | 20.0% | |
| 后 | 1 | 20.0% | |
| 的 | 1 | 20.0% | |
| 舞 | 1 | 20.0% | |
| 者 | 1 | 20.0% |
| Distinct | 4800 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| Batman | 2 |
|---|---|
| Out of the Blue | 2 |
| The Host | 2 |
| Slither | 1 |
| Appaloosa | 1 |
| Other values (4795) |
| Value | Count | Frequency (%) | |
| Batman | 2 | < 0.1% | |
| Out of the Blue | 2 | < 0.1% | |
| The Host | 2 | < 0.1% | |
| Slither | 1 | < 0.1% | |
| Appaloosa | 1 | < 0.1% | |
| Wimbledon | 1 | < 0.1% | |
| Knocked Up | 1 | < 0.1% | |
| Fever Pitch | 1 | < 0.1% | |
| Miracle at St. Anna | 1 | < 0.1% | |
| Bandidas | 1 | < 0.1% | |
| The Benchwarmers | 1 | < 0.1% | |
| The Bounty Hunter | 1 | < 0.1% | |
| The Warlords | 1 | < 0.1% | |
| Boyhood | 1 | < 0.1% | |
| Killer Elite | 1 | < 0.1% | |
| Kansas City | 1 | < 0.1% | |
| The Grudge | 1 | < 0.1% | |
| The Monkey King 2 | 1 | < 0.1% | |
| Love & Basketball | 1 | < 0.1% | |
| Kung Pow: Enter the Fist | 1 | < 0.1% | |
| Krrish | 1 | < 0.1% | |
| R.I.P.D. | 1 | < 0.1% | |
| Miss Congeniality | 1 | < 0.1% | |
| Like Crazy | 1 | < 0.1% | |
| Unfriended | 1 | < 0.1% | |
| Other values (4775) | 4775 | 99.4% |
Frequencies of value counts
Unique
| Unique | 4797 ? |
|---|---|
| Unique (%) | 99.9% |
Histogram of lengths of the category
Length
| Max length | 86 |
|---|---|
| Median length | 14 |
| Mean length | 15.34915678 |
| Min length | 1 |
Most occurring characters
| Value | Count | Frequency (%) | |
| 8553 | 11.6% | ||
| e | 7525 | 10.2% | |
| a | 4632 | 6.3% | |
| o | 4470 | 6.1% | |
| n | 3950 | 5.4% | |
| r | 3946 | 5.4% | |
| i | 3765 | 5.1% | |
| t | 3660 | 5.0% | |
| s | 2862 | 3.9% | |
| h | 2852 | 3.9% | |
| l | 2417 | 3.3% | |
| d | 1784 | 2.4% | |
| T | 1668 | 2.3% | |
| u | 1506 | 2.0% | |
| c | 1180 | 1.6% | |
| g | 1158 | 1.6% | |
| y | 1120 | 1.5% | |
| m | 1060 | 1.4% | |
| S | 1007 | 1.4% | |
| f | 859 | 1.2% | |
| M | 800 | 1.1% | |
| B | 756 | 1.0% | |
| p | 690 | 0.9% | |
| D | 687 | 0.9% | |
| C | 663 | 0.9% | |
| Other values (73) | 10152 | 13.8% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 51907 | 70.4% | |
| Uppercase Letter | 11748 | 15.9% | |
| Space Separator | 8553 | 11.6% | |
| Other Punctuation | 909 | 1.2% | |
| Decimal Number | 494 | 0.7% | |
| Dash Punctuation | 82 | 0.1% | |
| Open Punctuation | 7 | < 0.1% | |
| Close Punctuation | 7 | < 0.1% | |
| Other Number | 4 | < 0.1% | |
| Currency Symbol | 4 | < 0.1% | |
| Final Punctuation | 3 | < 0.1% | |
| Math Symbol | 2 | < 0.1% | |
| Connector Punctuation | 1 | < 0.1% | |
| Other Symbol | 1 | < 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| T | 1668 | 14.2% | |
| S | 1007 | 8.6% | |
| M | 800 | 6.8% | |
| B | 756 | 6.4% | |
| D | 687 | 5.8% | |
| C | 663 | 5.6% | |
| A | 640 | 5.4% | |
| L | 543 | 4.6% | |
| H | 541 | 4.6% | |
| W | 500 | 4.3% | |
| P | 471 | 4.0% | |
| G | 468 | 4.0% | |
| R | 467 | 4.0% | |
| F | 453 | 3.9% | |
| I | 453 | 3.9% | |
| E | 302 | 2.6% | |
| N | 265 | 2.3% | |
| O | 221 | 1.9% | |
| J | 191 | 1.6% | |
| K | 184 | 1.6% | |
| V | 142 | 1.2% | |
| Y | 125 | 1.1% | |
| U | 112 | 1.0% | |
| Z | 41 | 0.3% | |
| Q | 26 | 0.2% | |
| Other values (2) | 22 | 0.2% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 7525 | 14.5% | |
| a | 4632 | 8.9% | |
| o | 4470 | 8.6% | |
| n | 3950 | 7.6% | |
| r | 3946 | 7.6% | |
| i | 3765 | 7.3% | |
| t | 3660 | 7.1% | |
| s | 2862 | 5.5% | |
| h | 2852 | 5.5% | |
| l | 2417 | 4.7% | |
| d | 1784 | 3.4% | |
| u | 1506 | 2.9% | |
| c | 1180 | 2.3% | |
| g | 1158 | 2.2% | |
| y | 1120 | 2.2% | |
| m | 1060 | 2.0% | |
| f | 859 | 1.7% | |
| p | 690 | 1.3% | |
| k | 637 | 1.2% | |
| v | 580 | 1.1% | |
| w | 483 | 0.9% | |
| b | 456 | 0.9% | |
| x | 139 | 0.3% | |
| z | 94 | 0.2% | |
| j | 41 | 0.1% | |
| Other values (8) | 41 | 0.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 8553 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| : | 351 | 38.6% | |
| ' | 221 | 24.3% | |
| . | 141 | 15.5% | |
| , | 75 | 8.3% | |
| & | 61 | 6.7% | |
| ! | 31 | 3.4% | |
| ? | 17 | 1.9% | |
| / | 7 | 0.8% | |
| # | 2 | 0.2% | |
| * | 2 | 0.2% | |
| · | 1 | 0.1% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 80 | 97.6% | |
| – | 2 | 2.4% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 2 | 146 | 29.6% | |
| 1 | 79 | 16.0% | |
| 0 | 77 | 15.6% | |
| 3 | 72 | 14.6% | |
| 4 | 33 | 6.7% | |
| 8 | 21 | 4.3% | |
| 5 | 21 | 4.3% | |
| 9 | 16 | 3.2% | |
| 7 | 15 | 3.0% | |
| 6 | 14 | 2.8% |
Most frequent Other Number characters
| Value | Count | Frequency (%) | |
| ³ | 1 | 25.0% | |
| ⅓ | 1 | 25.0% | |
| ½ | 1 | 25.0% | |
| ² | 1 | 25.0% |
Most frequent Open Punctuation characters
| Value | Count | Frequency (%) | |
| ( | 5 | 71.4% | |
| [ | 2 | 28.6% |
Most frequent Close Punctuation characters
| Value | Count | Frequency (%) | |
| ) | 5 | 71.4% | |
| ] | 2 | 28.6% |
Most frequent Currency Symbol characters
| Value | Count | Frequency (%) | |
| ¢ | 2 | 50.0% | |
| $ | 2 | 50.0% |
Most frequent Math Symbol characters
| Value | Count | Frequency (%) | |
| + | 2 | 100.0% |
Most frequent Final Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 3 | 100.0% |
Most frequent Connector Punctuation characters
| Value | Count | Frequency (%) | |
| _ | 1 | 100.0% |
Most frequent Other Symbol characters
| Value | Count | Frequency (%) | |
| ° | 1 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 63655 | 86.3% | |
| Common | 10067 | 13.7% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 7525 | 11.8% | |
| a | 4632 | 7.3% | |
| o | 4470 | 7.0% | |
| n | 3950 | 6.2% | |
| r | 3946 | 6.2% | |
| i | 3765 | 5.9% | |
| t | 3660 | 5.7% | |
| s | 2862 | 4.5% | |
| h | 2852 | 4.5% | |
| l | 2417 | 3.8% | |
| d | 1784 | 2.8% | |
| T | 1668 | 2.6% | |
| u | 1506 | 2.4% | |
| c | 1180 | 1.9% | |
| g | 1158 | 1.8% | |
| y | 1120 | 1.8% | |
| m | 1060 | 1.7% | |
| S | 1007 | 1.6% | |
| f | 859 | 1.3% | |
| M | 800 | 1.3% | |
| B | 756 | 1.2% | |
| p | 690 | 1.1% | |
| D | 687 | 1.1% | |
| C | 663 | 1.0% | |
| A | 640 | 1.0% | |
| Other values (35) | 7998 | 12.6% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 8553 | 85.0% | ||
| : | 351 | 3.5% | |
| ' | 221 | 2.2% | |
| 2 | 146 | 1.5% | |
| . | 141 | 1.4% | |
| - | 80 | 0.8% | |
| 1 | 79 | 0.8% | |
| 0 | 77 | 0.8% | |
| , | 75 | 0.7% | |
| 3 | 72 | 0.7% | |
| & | 61 | 0.6% | |
| 4 | 33 | 0.3% | |
| ! | 31 | 0.3% | |
| 8 | 21 | 0.2% | |
| 5 | 21 | 0.2% | |
| ? | 17 | 0.2% | |
| 9 | 16 | 0.2% | |
| 7 | 15 | 0.1% | |
| 6 | 14 | 0.1% | |
| / | 7 | 0.1% | |
| ( | 5 | < 0.1% | |
| ) | 5 | < 0.1% | |
| ’ | 3 | < 0.1% | |
| ¢ | 2 | < 0.1% | |
| + | 2 | < 0.1% | |
| Other values (13) | 19 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 73696 | > 99.9% | |
| None | 20 | < 0.1% | |
| Punctuation | 5 | < 0.1% | |
| Number Forms | 1 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| 8553 | 11.6% | ||
| e | 7525 | 10.2% | |
| a | 4632 | 6.3% | |
| o | 4470 | 6.1% | |
| n | 3950 | 5.4% | |
| r | 3946 | 5.4% | |
| i | 3765 | 5.1% | |
| t | 3660 | 5.0% | |
| s | 2862 | 3.9% | |
| h | 2852 | 3.9% | |
| l | 2417 | 3.3% | |
| d | 1784 | 2.4% | |
| T | 1668 | 2.3% | |
| u | 1506 | 2.0% | |
| c | 1180 | 1.6% | |
| g | 1158 | 1.6% | |
| y | 1120 | 1.5% | |
| m | 1060 | 1.4% | |
| S | 1007 | 1.4% | |
| f | 859 | 1.2% | |
| M | 800 | 1.1% | |
| B | 756 | 1.0% | |
| p | 690 | 0.9% | |
| D | 687 | 0.9% | |
| C | 663 | 0.9% | |
| Other values (56) | 10126 | 13.7% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| é | 6 | 30.0% | |
| ¢ | 2 | 10.0% | |
| · | 1 | 5.0% | |
| à | 1 | 5.0% | |
| ³ | 1 | 5.0% | |
| Æ | 1 | 5.0% | |
| ü | 1 | 5.0% | |
| ½ | 1 | 5.0% | |
| ë | 1 | 5.0% | |
| ² | 1 | 5.0% | |
| á | 1 | 5.0% | |
| ó | 1 | 5.0% | |
| ñ | 1 | 5.0% | |
| ° | 1 | 5.0% |
Most frequent Number Forms characters
| Value | Count | Frequency (%) | |
| ⅓ | 1 | 100.0% |
Most frequent Punctuation characters
| Value | Count | Frequency (%) | |
| ’ | 3 | 60.0% | |
| – | 2 | 40.0% |
| Distinct | 71 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.092171559 |
|---|---|
| Minimum | 0 |
| Maximum | 10 |
| Zeros | 63 |
| Zeros (%) | 1.3% |
| Memory size | 37.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4.3 |
| Q1 | 5.6 |
| median | 6.2 |
| Q3 | 6.8 |
| 95-th percentile | 7.6 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 1.2 |
Descriptive statistics
| Standard deviation | 1.194612163 |
|---|---|
| Coefficient of variation (CV) | 0.1960897114 |
| Kurtosis | 7.792362845 |
| Mean | 6.092171559 |
| Median Absolute Deviation (MAD) | 0.6 |
| Skewness | -1.959710007 |
| Sum | 29260.7 |
| Variance | 1.42709822 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 6.5 | 216 | 4.5% | |
| 6 | 216 | 4.5% | |
| 6.7 | 213 | 4.4% | |
| 6.3 | 207 | 4.3% | |
| 6.1 | 201 | 4.2% | |
| 6.4 | 201 | 4.2% | |
| 6.2 | 200 | 4.2% | |
| 6.6 | 198 | 4.1% | |
| 5.9 | 196 | 4.1% | |
| 5.8 | 187 | 3.9% | |
| 7 | 179 | 3.7% | |
| 6.8 | 172 | 3.6% | |
| 6.9 | 160 | 3.3% | |
| 5.7 | 153 | 3.2% | |
| 5.5 | 152 | 3.2% | |
| 5.6 | 144 | 3.0% | |
| 5.4 | 127 | 2.6% | |
| 7.3 | 125 | 2.6% | |
| 7.1 | 119 | 2.5% | |
| 7.2 | 119 | 2.5% | |
| 7.4 | 109 | 2.3% | |
| 5.3 | 105 | 2.2% | |
| 5.2 | 93 | 1.9% | |
| 5 | 86 | 1.8% | |
| 7.5 | 66 | 1.4% | |
| Other values (46) | 859 | 17.9% |
| Value | Count | Frequency (%) | |
| 0 | 63 | 1.3% | |
| 0.5 | 1 | < 0.1% | |
| 1 | 2 | < 0.1% | |
| 1.9 | 1 | < 0.1% | |
| 2 | 6 | 0.1% | |
| 2.2 | 1 | < 0.1% | |
| 2.3 | 2 | < 0.1% | |
| 2.4 | 1 | < 0.1% | |
| 2.6 | 1 | < 0.1% | |
| 2.7 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 10 | 4 | 0.1% | |
| 9.5 | 1 | < 0.1% | |
| 9.3 | 1 | < 0.1% | |
| 8.5 | 2 | < 0.1% | |
| 8.4 | 2 | < 0.1% | |
| 8.3 | 7 | 0.1% | |
| 8.2 | 15 | 0.3% | |
| 8.1 | 18 | 0.4% | |
| 8 | 35 | 0.7% | |
| 7.9 | 32 | 0.7% |
| Distinct | 1609 |
|---|---|
| Distinct (%) | 33.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 690.2179888 |
|---|---|
| Minimum | 0 |
| Maximum | 13752 |
| Zeros | 62 |
| Zeros (%) | 1.3% |
| Memory size | 37.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 54 |
| median | 235 |
| Q3 | 737 |
| 95-th percentile | 3040.9 |
| Maximum | 13752 |
| Range | 13752 |
| Interquartile range (IQR) | 683 |
Descriptive statistics
| Standard deviation | 1234.585891 |
|---|---|
| Coefficient of variation (CV) | 1.788689821 |
| Kurtosis | 19.91394618 |
| Mean | 690.2179888 |
| Median Absolute Deviation (MAD) | 214 |
| Skewness | 3.824068535 |
| Sum | 3315117 |
| Variance | 1524202.322 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 0 | 62 | 1.3% | |
| 1 | 53 | 1.1% | |
| 2 | 46 | 1.0% | |
| 4 | 43 | 0.9% | |
| 3 | 41 | 0.9% | |
| 6 | 38 | 0.8% | |
| 8 | 37 | 0.8% | |
| 10 | 34 | 0.7% | |
| 11 | 32 | 0.7% | |
| 9 | 32 | 0.7% | |
| 7 | 31 | 0.6% | |
| 5 | 28 | 0.6% | |
| 15 | 26 | 0.5% | |
| 19 | 26 | 0.5% | |
| 12 | 26 | 0.5% | |
| 13 | 25 | 0.5% | |
| 16 | 24 | 0.5% | |
| 22 | 23 | 0.5% | |
| 34 | 23 | 0.5% | |
| 31 | 22 | 0.5% | |
| 24 | 22 | 0.5% | |
| 18 | 22 | 0.5% | |
| 17 | 21 | 0.4% | |
| 25 | 20 | 0.4% | |
| 26 | 20 | 0.4% | |
| Other values (1584) | 4026 | 83.8% |
| Value | Count | Frequency (%) | |
| 0 | 62 | 1.3% | |
| 1 | 53 | 1.1% | |
| 2 | 46 | 1.0% | |
| 3 | 41 | 0.9% | |
| 4 | 43 | 0.9% | |
| 5 | 28 | 0.6% | |
| 6 | 38 | 0.8% | |
| 7 | 31 | 0.6% | |
| 8 | 37 | 0.8% | |
| 9 | 32 | 0.7% |
| Value | Count | Frequency (%) | |
| 13752 | 1 | < 0.1% | |
| 12002 | 1 | < 0.1% | |
| 11800 | 1 | < 0.1% | |
| 11776 | 1 | < 0.1% | |
| 10995 | 1 | < 0.1% | |
| 10867 | 1 | < 0.1% | |
| 10099 | 1 | < 0.1% | |
| 9742 | 1 | < 0.1% | |
| 9455 | 1 | < 0.1% | |
| 9427 | 1 | < 0.1% |
| Distinct | 71 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| United States of America | |
|---|---|
| United Kingdom | |
| Canada | 220 |
| Germany | 200 |
| UNK | 174 |
| Other values (66) |
| Value | Count | Frequency (%) | |
| United States of America | 3102 | 64.6% | |
| United Kingdom | 374 | 7.8% | |
| Canada | 220 | 4.6% | |
| Germany | 200 | 4.2% | |
| UNK | 174 | 3.6% | |
| France | 174 | 3.6% | |
| Australia | 87 | 1.8% | |
| India | 42 | 0.9% | |
| China | 40 | 0.8% | |
| Japan | 34 | 0.7% | |
| Spain | 34 | 0.7% | |
| Italy | 26 | 0.5% | |
| Ireland | 22 | 0.5% | |
| Mexico | 22 | 0.5% | |
| New Zealand | 22 | 0.5% | |
| Hong Kong | 22 | 0.5% | |
| Czech Republic | 18 | 0.4% | |
| Belgium | 17 | 0.4% | |
| Denmark | 14 | 0.3% | |
| South Korea | 13 | 0.3% | |
| Brazil | 13 | 0.3% | |
| Russia | 11 | 0.2% | |
| Switzerland | 10 | 0.2% | |
| Netherlands | 10 | 0.2% | |
| South Africa | 9 | 0.2% | |
| Other values (46) | 93 | 1.9% |
Frequencies of value counts
Unique
| Unique | 28 ? |
|---|---|
| Unique (%) | 0.6% |
Histogram of lengths of the category
Length
| Max length | 24 |
|---|---|
| Median length | 24 |
| Mean length | 18.37164272 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| e | 10311 | 11.7% | |
| t | 9865 | 11.2% | |
| 9784 | 11.1% | ||
| a | 7890 | 8.9% | |
| i | 7321 | 8.3% | |
| n | 4784 | 5.4% | |
| d | 4200 | 4.8% | |
| m | 3729 | 4.2% | |
| r | 3716 | 4.2% | |
| U | 3657 | 4.1% | |
| o | 3605 | 4.1% | |
| c | 3356 | 3.8% | |
| s | 3246 | 3.7% | |
| A | 3221 | 3.7% | |
| S | 3174 | 3.6% | |
| f | 3112 | 3.5% | |
| K | 585 | 0.7% | |
| g | 456 | 0.5% | |
| C | 280 | 0.3% | |
| l | 250 | 0.3% | |
| y | 245 | 0.3% | |
| N | 212 | 0.2% | |
| G | 205 | 0.2% | |
| u | 182 | 0.2% | |
| F | 178 | 0.2% | |
| Other values (21) | 675 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 66623 | 75.5% | |
| Uppercase Letter | 11832 | 13.4% | |
| Space Separator | 9784 | 11.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| U | 3657 | 30.9% | |
| A | 3221 | 27.2% | |
| S | 3174 | 26.8% | |
| K | 585 | 4.9% | |
| C | 280 | 2.4% | |
| N | 212 | 1.8% | |
| G | 205 | 1.7% | |
| F | 178 | 1.5% | |
| I | 98 | 0.8% | |
| B | 40 | 0.3% | |
| J | 37 | 0.3% | |
| R | 35 | 0.3% | |
| H | 26 | 0.2% | |
| M | 26 | 0.2% | |
| Z | 22 | 0.2% | |
| D | 15 | 0.1% | |
| E | 8 | 0.1% | |
| P | 5 | < 0.1% | |
| L | 4 | < 0.1% | |
| T | 4 | < 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 10311 | 15.5% | |
| t | 9865 | 14.8% | |
| a | 7890 | 11.8% | |
| i | 7321 | 11.0% | |
| n | 4784 | 7.2% | |
| d | 4200 | 6.3% | |
| m | 3729 | 5.6% | |
| r | 3716 | 5.6% | |
| o | 3605 | 5.4% | |
| c | 3356 | 5.0% | |
| s | 3246 | 4.9% | |
| f | 3112 | 4.7% | |
| g | 456 | 0.7% | |
| l | 250 | 0.4% | |
| y | 245 | 0.4% | |
| u | 182 | 0.3% | |
| h | 100 | 0.2% | |
| p | 93 | 0.1% | |
| z | 43 | 0.1% | |
| w | 42 | 0.1% | |
| b | 33 | < 0.1% | |
| x | 24 | < 0.1% | |
| k | 16 | < 0.1% | |
| v | 3 | < 0.1% | |
| j | 1 | < 0.1% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 9784 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 78455 | 88.9% | |
| Common | 9784 | 11.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 10311 | 13.1% | |
| t | 9865 | 12.6% | |
| a | 7890 | 10.1% | |
| i | 7321 | 9.3% | |
| n | 4784 | 6.1% | |
| d | 4200 | 5.4% | |
| m | 3729 | 4.8% | |
| r | 3716 | 4.7% | |
| U | 3657 | 4.7% | |
| o | 3605 | 4.6% | |
| c | 3356 | 4.3% | |
| s | 3246 | 4.1% | |
| A | 3221 | 4.1% | |
| S | 3174 | 4.0% | |
| f | 3112 | 4.0% | |
| K | 585 | 0.7% | |
| g | 456 | 0.6% | |
| C | 280 | 0.4% | |
| l | 250 | 0.3% | |
| y | 245 | 0.3% | |
| N | 212 | 0.3% | |
| G | 205 | 0.3% | |
| u | 182 | 0.2% | |
| F | 178 | 0.2% | |
| h | 100 | 0.1% | |
| Other values (20) | 575 | 0.7% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 9784 | 100.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 88239 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| e | 10311 | 11.7% | |
| t | 9865 | 11.2% | |
| 9784 | 11.1% | ||
| a | 7890 | 8.9% | |
| i | 7321 | 8.3% | |
| n | 4784 | 5.4% | |
| d | 4200 | 4.8% | |
| m | 3729 | 4.2% | |
| r | 3716 | 4.2% | |
| U | 3657 | 4.1% | |
| o | 3605 | 4.1% | |
| c | 3356 | 3.8% | |
| s | 3246 | 3.7% | |
| A | 3221 | 3.7% | |
| S | 3174 | 3.6% | |
| f | 3112 | 3.5% | |
| K | 585 | 0.7% | |
| g | 456 | 0.5% | |
| C | 280 | 0.3% | |
| l | 250 | 0.3% | |
| y | 245 | 0.3% | |
| N | 212 | 0.2% | |
| G | 205 | 0.2% | |
| u | 182 | 0.2% | |
| F | 178 | 0.2% | |
| Other values (21) | 675 | 0.8% |
| Distinct | 2350 |
|---|---|
| Distinct (%) | 48.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| UNK | 30 |
|---|---|
| Steven Spielberg | 27 |
| Woody Allen | 21 |
| Clint Eastwood | 20 |
| Martin Scorsese | 20 |
| Other values (2345) |
| Value | Count | Frequency (%) | |
| UNK | 30 | 0.6% | |
| Steven Spielberg | 27 | 0.6% | |
| Woody Allen | 21 | 0.4% | |
| Clint Eastwood | 20 | 0.4% | |
| Martin Scorsese | 20 | 0.4% | |
| Spike Lee | 16 | 0.3% | |
| Ridley Scott | 16 | 0.3% | |
| Robert Rodriguez | 16 | 0.3% | |
| Renny Harlin | 15 | 0.3% | |
| Steven Soderbergh | 15 | 0.3% | |
| Oliver Stone | 14 | 0.3% | |
| Tim Burton | 14 | 0.3% | |
| Barry Levinson | 13 | 0.3% | |
| Joel Schumacher | 13 | 0.3% | |
| Robert Zemeckis | 13 | 0.3% | |
| Ron Howard | 13 | 0.3% | |
| Brian De Palma | 12 | 0.2% | |
| Francis Ford Coppola | 12 | 0.2% | |
| Michael Bay | 12 | 0.2% | |
| Tony Scott | 12 | 0.2% | |
| Kevin Smith | 12 | 0.2% | |
| Richard Donner | 11 | 0.2% | |
| Joel Coen | 11 | 0.2% | |
| Chris Columbus | 11 | 0.2% | |
| Richard Linklater | 11 | 0.2% | |
| Other values (2325) | 4423 | 92.1% |
Frequencies of value counts
Unique
| Unique | 1475 ? |
|---|---|
| Unique (%) | 30.7% |
Histogram of lengths of the category
Length
| Max length | 32 |
|---|---|
| Median length | 13 |
| Mean length | 13.05579846 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| e | 5897 | 9.4% | |
| 5169 | 8.2% | ||
| a | 5099 | 8.1% | |
| n | 4523 | 7.2% | |
| r | 4288 | 6.8% | |
| o | 3686 | 5.9% | |
| i | 3591 | 5.7% | |
| l | 2887 | 4.6% | |
| t | 2235 | 3.6% | |
| s | 2019 | 3.2% | |
| h | 1789 | 2.9% | |
| d | 1517 | 2.4% | |
| c | 1389 | 2.2% | |
| m | 1193 | 1.9% | |
| u | 1127 | 1.8% | |
| y | 1103 | 1.8% | |
| S | 990 | 1.6% | |
| k | 930 | 1.5% | |
| J | 895 | 1.4% | |
| M | 858 | 1.4% | |
| g | 802 | 1.3% | |
| R | 741 | 1.2% | |
| C | 692 | 1.1% | |
| B | 642 | 1.0% | |
| v | 617 | 1.0% | |
| Other values (58) | 8028 | 12.8% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 46971 | 74.9% | |
| Uppercase Letter | 10230 | 16.3% | |
| Space Separator | 5169 | 8.2% | |
| Other Punctuation | 254 | 0.4% | |
| Dash Punctuation | 83 | 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| S | 990 | 9.7% | |
| J | 895 | 8.7% | |
| M | 858 | 8.4% | |
| R | 741 | 7.2% | |
| C | 692 | 6.8% | |
| B | 642 | 6.3% | |
| D | 587 | 5.7% | |
| A | 530 | 5.2% | |
| L | 488 | 4.8% | |
| G | 468 | 4.6% | |
| P | 465 | 4.5% | |
| T | 437 | 4.3% | |
| H | 405 | 4.0% | |
| W | 383 | 3.7% | |
| K | 357 | 3.5% | |
| F | 353 | 3.5% | |
| N | 268 | 2.6% | |
| E | 177 | 1.7% | |
| O | 104 | 1.0% | |
| V | 101 | 1.0% | |
| Z | 91 | 0.9% | |
| I | 74 | 0.7% | |
| U | 47 | 0.5% | |
| Y | 45 | 0.4% | |
| Q | 19 | 0.2% | |
| Other values (7) | 13 | 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 5897 | 12.6% | |
| a | 5099 | 10.9% | |
| n | 4523 | 9.6% | |
| r | 4288 | 9.1% | |
| o | 3686 | 7.8% | |
| i | 3591 | 7.6% | |
| l | 2887 | 6.1% | |
| t | 2235 | 4.8% | |
| s | 2019 | 4.3% | |
| h | 1789 | 3.8% | |
| d | 1517 | 3.2% | |
| c | 1389 | 3.0% | |
| m | 1193 | 2.5% | |
| u | 1127 | 2.4% | |
| y | 1103 | 2.3% | |
| k | 930 | 2.0% | |
| g | 802 | 1.7% | |
| v | 617 | 1.3% | |
| b | 571 | 1.2% | |
| p | 444 | 0.9% | |
| w | 417 | 0.9% | |
| f | 322 | 0.7% | |
| z | 217 | 0.5% | |
| x | 68 | 0.1% | |
| j | 68 | 0.1% | |
| Other values (21) | 172 | 0.4% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 5169 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 228 | 89.8% | |
| ' | 21 | 8.3% | |
| , | 5 | 2.0% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 83 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 57201 | 91.2% | |
| Common | 5506 | 8.8% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 5897 | 10.3% | |
| a | 5099 | 8.9% | |
| n | 4523 | 7.9% | |
| r | 4288 | 7.5% | |
| o | 3686 | 6.4% | |
| i | 3591 | 6.3% | |
| l | 2887 | 5.0% | |
| t | 2235 | 3.9% | |
| s | 2019 | 3.5% | |
| h | 1789 | 3.1% | |
| d | 1517 | 2.7% | |
| c | 1389 | 2.4% | |
| m | 1193 | 2.1% | |
| u | 1127 | 2.0% | |
| y | 1103 | 1.9% | |
| S | 990 | 1.7% | |
| k | 930 | 1.6% | |
| J | 895 | 1.6% | |
| M | 858 | 1.5% | |
| g | 802 | 1.4% | |
| R | 741 | 1.3% | |
| C | 692 | 1.2% | |
| B | 642 | 1.1% | |
| v | 617 | 1.1% | |
| D | 587 | 1.0% | |
| Other values (53) | 7104 | 12.4% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 5169 | 93.9% | ||
| . | 228 | 4.1% | |
| - | 83 | 1.5% | |
| ' | 21 | 0.4% | |
| , | 5 | 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 62552 | 99.8% | |
| None | 155 | 0.2% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| e | 5897 | 9.4% | |
| 5169 | 8.3% | ||
| a | 5099 | 8.2% | |
| n | 4523 | 7.2% | |
| r | 4288 | 6.9% | |
| o | 3686 | 5.9% | |
| i | 3591 | 5.7% | |
| l | 2887 | 4.6% | |
| t | 2235 | 3.6% | |
| s | 2019 | 3.2% | |
| h | 1789 | 2.9% | |
| d | 1517 | 2.4% | |
| c | 1389 | 2.2% | |
| m | 1193 | 1.9% | |
| u | 1127 | 1.8% | |
| y | 1103 | 1.8% | |
| S | 990 | 1.6% | |
| k | 930 | 1.5% | |
| J | 895 | 1.4% | |
| M | 858 | 1.4% | |
| g | 802 | 1.3% | |
| R | 741 | 1.2% | |
| C | 692 | 1.1% | |
| B | 642 | 1.0% | |
| v | 617 | 1.0% | |
| Other values (32) | 7873 | 12.6% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| é | 39 | 25.2% | |
| á | 27 | 17.4% | |
| ó | 18 | 11.6% | |
| ö | 15 | 9.7% | |
| í | 9 | 5.8% | |
| ñ | 7 | 4.5% | |
| å | 6 | 3.9% | |
| ç | 5 | 3.2% | |
| š | 4 | 2.6% | |
| É | 3 | 1.9% | |
| ô | 2 | 1.3% | |
| Ō | 2 | 1.3% | |
| ï | 2 | 1.3% | |
| ä | 2 | 1.3% | |
| Å | 2 | 1.3% | |
| ł | 2 | 1.3% | |
| À | 1 | 0.6% | |
| ø | 1 | 0.6% | |
| ń | 1 | 0.6% | |
| û | 1 | 0.6% | |
| Á | 1 | 0.6% | |
| ř | 1 | 0.6% | |
| Ø | 1 | 0.6% | |
| æ | 1 | 0.6% | |
| ž | 1 | 0.6% |
| Distinct | 2721 |
|---|---|
| Distinct (%) | 56.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| UNK | 53 |
|---|---|
| Jennifer Aniston | 15 |
| Morgan Freeman | 13 |
| Brad Pitt | 12 |
| Samuel L. Jackson | 12 |
| Other values (2716) |
| Value | Count | Frequency (%) | |
| UNK | 53 | 1.1% | |
| Jennifer Aniston | 15 | 0.3% | |
| Morgan Freeman | 13 | 0.3% | |
| Brad Pitt | 12 | 0.2% | |
| Samuel L. Jackson | 12 | 0.2% | |
| Robert De Niro | 11 | 0.2% | |
| Scarlett Johansson | 11 | 0.2% | |
| Alec Baldwin | 11 | 0.2% | |
| Diane Keaton | 11 | 0.2% | |
| Charlize Theron | 10 | 0.2% | |
| Julianne Moore | 10 | 0.2% | |
| Josh Hutcherson | 10 | 0.2% | |
| Gary Oldman | 10 | 0.2% | |
| Matt Damon | 10 | 0.2% | |
| Kate Winslet | 10 | 0.2% | |
| Philip Seymour Hoffman | 10 | 0.2% | |
| Gene Hackman | 10 | 0.2% | |
| Ben Kingsley | 9 | 0.2% | |
| Colin Firth | 9 | 0.2% | |
| Dustin Hoffman | 9 | 0.2% | |
| Ewan McGregor | 9 | 0.2% | |
| Justin Long | 9 | 0.2% | |
| Laurence Fishburne | 9 | 0.2% | |
| Drew Barrymore | 9 | 0.2% | |
| Gwyneth Paltrow | 9 | 0.2% | |
| Other values (2696) | 4502 | 93.7% |
Frequencies of value counts
Unique
| Unique | 1923 ? |
|---|---|
| Unique (%) | 40.0% |
Histogram of lengths of the category
Length
| Max length | 27 |
|---|---|
| Median length | 13 |
| Mean length | 12.98313554 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| e | 5975 | 9.6% | |
| a | 5647 | 9.1% | |
| 5065 | 8.1% | ||
| n | 4626 | 7.4% | |
| r | 3903 | 6.3% | |
| i | 3823 | 6.1% | |
| o | 3371 | 5.4% | |
| l | 3085 | 4.9% | |
| t | 2351 | 3.8% | |
| s | 2241 | 3.6% | |
| h | 1785 | 2.9% | |
| d | 1266 | 2.0% | |
| y | 1255 | 2.0% | |
| u | 1249 | 2.0% | |
| c | 1202 | 1.9% | |
| m | 1177 | 1.9% | |
| M | 872 | 1.4% | |
| J | 850 | 1.4% | |
| g | 801 | 1.3% | |
| C | 769 | 1.2% | |
| S | 752 | 1.2% | |
| B | 719 | 1.2% | |
| k | 655 | 1.1% | |
| D | 626 | 1.0% | |
| R | 583 | 0.9% | |
| Other values (59) | 7710 | 12.4% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 46908 | 75.2% | |
| Uppercase Letter | 10178 | 16.3% | |
| Space Separator | 5065 | 8.1% | |
| Other Punctuation | 141 | 0.2% | |
| Dash Punctuation | 66 | 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| M | 872 | 8.6% | |
| J | 850 | 8.4% | |
| C | 769 | 7.6% | |
| S | 752 | 7.4% | |
| B | 719 | 7.1% | |
| D | 626 | 6.2% | |
| R | 583 | 5.7% | |
| A | 580 | 5.7% | |
| K | 521 | 5.1% | |
| H | 490 | 4.8% | |
| L | 475 | 4.7% | |
| G | 435 | 4.3% | |
| P | 401 | 3.9% | |
| T | 360 | 3.5% | |
| W | 358 | 3.5% | |
| E | 293 | 2.9% | |
| F | 283 | 2.8% | |
| N | 278 | 2.7% | |
| V | 128 | 1.3% | |
| O | 107 | 1.1% | |
| I | 76 | 0.7% | |
| U | 72 | 0.7% | |
| Z | 65 | 0.6% | |
| Y | 42 | 0.4% | |
| Q | 30 | 0.3% | |
| Other values (5) | 13 | 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 5975 | 12.7% | |
| a | 5647 | 12.0% | |
| n | 4626 | 9.9% | |
| r | 3903 | 8.3% | |
| i | 3823 | 8.1% | |
| o | 3371 | 7.2% | |
| l | 3085 | 6.6% | |
| t | 2351 | 5.0% | |
| s | 2241 | 4.8% | |
| h | 1785 | 3.8% | |
| d | 1266 | 2.7% | |
| y | 1255 | 2.7% | |
| u | 1249 | 2.7% | |
| c | 1202 | 2.6% | |
| m | 1177 | 2.5% | |
| g | 801 | 1.7% | |
| k | 655 | 1.4% | |
| b | 429 | 0.9% | |
| v | 426 | 0.9% | |
| f | 395 | 0.8% | |
| w | 392 | 0.8% | |
| p | 345 | 0.7% | |
| z | 210 | 0.4% | |
| x | 83 | 0.2% | |
| é | 58 | 0.1% | |
| Other values (24) | 158 | 0.3% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 5065 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 102 | 72.3% | |
| ' | 37 | 26.2% | |
| " | 2 | 1.4% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 66 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 57086 | 91.5% | |
| Common | 5272 | 8.5% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 5975 | 10.5% | |
| a | 5647 | 9.9% | |
| n | 4626 | 8.1% | |
| r | 3903 | 6.8% | |
| i | 3823 | 6.7% | |
| o | 3371 | 5.9% | |
| l | 3085 | 5.4% | |
| t | 2351 | 4.1% | |
| s | 2241 | 3.9% | |
| h | 1785 | 3.1% | |
| d | 1266 | 2.2% | |
| y | 1255 | 2.2% | |
| u | 1249 | 2.2% | |
| c | 1202 | 2.1% | |
| m | 1177 | 2.1% | |
| M | 872 | 1.5% | |
| J | 850 | 1.5% | |
| g | 801 | 1.4% | |
| C | 769 | 1.3% | |
| S | 752 | 1.3% | |
| B | 719 | 1.3% | |
| k | 655 | 1.1% | |
| D | 626 | 1.1% | |
| R | 583 | 1.0% | |
| A | 580 | 1.0% | |
| Other values (54) | 6923 | 12.1% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 5065 | 96.1% | ||
| . | 102 | 1.9% | |
| - | 66 | 1.3% | |
| ' | 37 | 0.7% | |
| " | 2 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 62209 | 99.8% | |
| None | 149 | 0.2% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| e | 5975 | 9.6% | |
| a | 5647 | 9.1% | |
| 5065 | 8.1% | ||
| n | 4626 | 7.4% | |
| r | 3903 | 6.3% | |
| i | 3823 | 6.1% | |
| o | 3371 | 5.4% | |
| l | 3085 | 5.0% | |
| t | 2351 | 3.8% | |
| s | 2241 | 3.6% | |
| h | 1785 | 2.9% | |
| d | 1266 | 2.0% | |
| y | 1255 | 2.0% | |
| u | 1249 | 2.0% | |
| c | 1202 | 1.9% | |
| m | 1177 | 1.9% | |
| M | 872 | 1.4% | |
| J | 850 | 1.4% | |
| g | 801 | 1.3% | |
| C | 769 | 1.2% | |
| S | 752 | 1.2% | |
| B | 719 | 1.2% | |
| k | 655 | 1.1% | |
| D | 626 | 1.0% | |
| R | 583 | 0.9% | |
| Other values (32) | 7561 | 12.2% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| é | 58 | 38.9% | |
| á | 17 | 11.4% | |
| í | 15 | 10.1% | |
| ë | 10 | 6.7% | |
| ó | 8 | 5.4% | |
| ü | 4 | 2.7% | |
| å | 3 | 2.0% | |
| ñ | 3 | 2.0% | |
| ô | 3 | 2.0% | |
| ç | 3 | 2.0% | |
| Å | 2 | 1.3% | |
| ú | 2 | 1.3% | |
| ø | 2 | 1.3% | |
| ć | 2 | 1.3% | |
| ö | 2 | 1.3% | |
| è | 2 | 1.3% | |
| ê | 2 | 1.3% | |
| ï | 2 | 1.3% | |
| Á | 1 | 0.7% | |
| î | 1 | 0.7% | |
| ā | 1 | 0.7% | |
| š | 1 | 0.7% | |
| Ó | 1 | 0.7% | |
| ś | 1 | 0.7% | |
| č | 1 | 0.7% | |
| Other values (2) | 2 | 1.3% |
| Distinct | 3096 |
|---|---|
| Distinct (%) | 64.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| UNK | 63 |
|---|---|
| Marisa Tomei | 9 |
| Ed Harris | 9 |
| Cameron Diaz | 9 |
| John Goodman | 8 |
| Other values (3091) |
| Value | Count | Frequency (%) | |
| UNK | 63 | 1.3% | |
| Marisa Tomei | 9 | 0.2% | |
| Ed Harris | 9 | 0.2% | |
| Cameron Diaz | 9 | 0.2% | |
| John Goodman | 8 | 0.2% | |
| Josh Brolin | 8 | 0.2% | |
| Samuel L. Jackson | 8 | 0.2% | |
| Forest Whitaker | 8 | 0.2% | |
| Kevin Bacon | 8 | 0.2% | |
| Emma Watson | 8 | 0.2% | |
| Susan Sarandon | 8 | 0.2% | |
| Mark Ruffalo | 8 | 0.2% | |
| John Leguizamo | 7 | 0.1% | |
| Zooey Deschanel | 7 | 0.1% | |
| Woody Harrelson | 7 | 0.1% | |
| Jon Voight | 7 | 0.1% | |
| Steve Buscemi | 7 | 0.1% | |
| Ralph Fiennes | 7 | 0.1% | |
| Justin Timberlake | 7 | 0.1% | |
| Rosario Dawson | 7 | 0.1% | |
| Leslie Mann | 7 | 0.1% | |
| Nick Nolte | 6 | 0.1% | |
| Denise Richards | 6 | 0.1% | |
| Dan Aykroyd | 6 | 0.1% | |
| Sam Neill | 6 | 0.1% | |
| Other values (3071) | 4562 | 95.0% |
Frequencies of value counts
Unique
| Unique | 2281 ? |
|---|---|
| Unique (%) | 47.5% |
Histogram of lengths of the category
Length
| Max length | 27 |
|---|---|
| Median length | 13 |
| Mean length | 12.97085155 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| e | 5847 | 9.4% | |
| a | 5717 | 9.2% | |
| 5010 | 8.0% | ||
| n | 4580 | 7.4% | |
| r | 3907 | 6.3% | |
| i | 3861 | 6.2% | |
| o | 3441 | 5.5% | |
| l | 3168 | 5.1% | |
| t | 2235 | 3.6% | |
| s | 2175 | 3.5% | |
| h | 1842 | 3.0% | |
| d | 1346 | 2.2% | |
| y | 1246 | 2.0% | |
| m | 1226 | 2.0% | |
| c | 1160 | 1.9% | |
| u | 1150 | 1.8% | |
| M | 890 | 1.4% | |
| J | 812 | 1.3% | |
| g | 765 | 1.2% | |
| C | 752 | 1.2% | |
| S | 748 | 1.2% | |
| B | 744 | 1.2% | |
| k | 642 | 1.0% | |
| D | 585 | 0.9% | |
| R | 534 | 0.9% | |
| Other values (80) | 7916 | 12.7% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 46952 | 75.4% | |
| Uppercase Letter | 10163 | 16.3% | |
| Space Separator | 5010 | 8.0% | |
| Other Punctuation | 97 | 0.2% | |
| Dash Punctuation | 74 | 0.1% | |
| Decimal Number | 2 | < 0.1% | |
| Nonspacing Mark | 1 | < 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| M | 890 | 8.8% | |
| J | 812 | 8.0% | |
| C | 752 | 7.4% | |
| S | 748 | 7.4% | |
| B | 744 | 7.3% | |
| D | 585 | 5.8% | |
| R | 534 | 5.3% | |
| A | 526 | 5.2% | |
| L | 517 | 5.1% | |
| K | 509 | 5.0% | |
| H | 484 | 4.8% | |
| P | 451 | 4.4% | |
| G | 425 | 4.2% | |
| T | 371 | 3.7% | |
| W | 347 | 3.4% | |
| E | 322 | 3.2% | |
| F | 283 | 2.8% | |
| N | 278 | 2.7% | |
| V | 142 | 1.4% | |
| O | 122 | 1.2% | |
| I | 92 | 0.9% | |
| U | 88 | 0.9% | |
| Z | 63 | 0.6% | |
| Y | 46 | 0.5% | |
| Q | 20 | 0.2% | |
| Other values (9) | 12 | 0.1% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 5847 | 12.5% | |
| a | 5717 | 12.2% | |
| n | 4580 | 9.8% | |
| r | 3907 | 8.3% | |
| i | 3861 | 8.2% | |
| o | 3441 | 7.3% | |
| l | 3168 | 6.7% | |
| t | 2235 | 4.8% | |
| s | 2175 | 4.6% | |
| h | 1842 | 3.9% | |
| d | 1346 | 2.9% | |
| y | 1246 | 2.7% | |
| m | 1226 | 2.6% | |
| c | 1160 | 2.5% | |
| u | 1150 | 2.4% | |
| g | 765 | 1.6% | |
| k | 642 | 1.4% | |
| b | 478 | 1.0% | |
| v | 463 | 1.0% | |
| p | 397 | 0.8% | |
| w | 371 | 0.8% | |
| f | 340 | 0.7% | |
| z | 236 | 0.5% | |
| x | 100 | 0.2% | |
| é | 61 | 0.1% | |
| Other values (38) | 198 | 0.4% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 5010 | 100.0% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 74 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 73 | 75.3% | |
| ' | 23 | 23.7% | |
| , | 1 | 1.0% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 5 | 1 | 50.0% | |
| 0 | 1 | 50.0% |
Most frequent Nonspacing Mark characters
| Value | Count | Frequency (%) | |
| ́ | 1 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 57104 | 91.7% | |
| Common | 5183 | 8.3% | |
| Cyrillic | 11 | < 0.1% | |
| Inherited | 1 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 5847 | 10.2% | |
| a | 5717 | 10.0% | |
| n | 4580 | 8.0% | |
| r | 3907 | 6.8% | |
| i | 3861 | 6.8% | |
| o | 3441 | 6.0% | |
| l | 3168 | 5.5% | |
| t | 2235 | 3.9% | |
| s | 2175 | 3.8% | |
| h | 1842 | 3.2% | |
| d | 1346 | 2.4% | |
| y | 1246 | 2.2% | |
| m | 1226 | 2.1% | |
| c | 1160 | 2.0% | |
| u | 1150 | 2.0% | |
| M | 890 | 1.6% | |
| J | 812 | 1.4% | |
| g | 765 | 1.3% | |
| C | 752 | 1.3% | |
| S | 748 | 1.3% | |
| B | 744 | 1.3% | |
| k | 642 | 1.1% | |
| D | 585 | 1.0% | |
| R | 534 | 0.9% | |
| A | 526 | 0.9% | |
| Other values (63) | 7205 | 12.6% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 5010 | 96.7% | ||
| - | 74 | 1.4% | |
| . | 73 | 1.4% | |
| ' | 23 | 0.4% | |
| 5 | 1 | < 0.1% | |
| 0 | 1 | < 0.1% | |
| , | 1 | < 0.1% |
Most frequent Cyrillic characters
| Value | Count | Frequency (%) | |
| и | 3 | 27.3% | |
| Ю | 1 | 9.1% | |
| л | 1 | 9.1% | |
| я | 1 | 9.1% | |
| С | 1 | 9.1% | |
| н | 1 | 9.1% | |
| г | 1 | 9.1% | |
| р | 1 | 9.1% | |
| ь | 1 | 9.1% |
Most frequent Inherited characters
| Value | Count | Frequency (%) | |
| ́ | 1 | 100.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 62116 | 99.7% | |
| None | 167 | 0.3% | |
| Cyrillic | 11 | < 0.1% | |
| Latin Ext Additional | 4 | < 0.1% | |
| Diacriticals | 1 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| e | 5847 | 9.4% | |
| a | 5717 | 9.2% | |
| 5010 | 8.1% | ||
| n | 4580 | 7.4% | |
| r | 3907 | 6.3% | |
| i | 3861 | 6.2% | |
| o | 3441 | 5.5% | |
| l | 3168 | 5.1% | |
| t | 2235 | 3.6% | |
| s | 2175 | 3.5% | |
| h | 1842 | 3.0% | |
| d | 1346 | 2.2% | |
| y | 1246 | 2.0% | |
| m | 1226 | 2.0% | |
| c | 1160 | 1.9% | |
| u | 1150 | 1.9% | |
| M | 890 | 1.4% | |
| J | 812 | 1.3% | |
| g | 765 | 1.2% | |
| C | 752 | 1.2% | |
| S | 748 | 1.2% | |
| B | 744 | 1.2% | |
| k | 642 | 1.0% | |
| D | 585 | 0.9% | |
| R | 534 | 0.9% | |
| Other values (34) | 7733 | 12.4% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| é | 61 | 36.5% | |
| á | 18 | 10.8% | |
| í | 16 | 9.6% | |
| ñ | 9 | 5.4% | |
| ë | 8 | 4.8% | |
| ü | 8 | 4.8% | |
| å | 5 | 3.0% | |
| ó | 5 | 3.0% | |
| ç | 4 | 2.4% | |
| ø | 3 | 1.8% | |
| è | 3 | 1.8% | |
| ö | 3 | 1.8% | |
| Å | 2 | 1.2% | |
| Đ | 2 | 1.2% | |
| ć | 2 | 1.2% | |
| ú | 2 | 1.2% | |
| à | 1 | 0.6% | |
| ș | 1 | 0.6% | |
| ä | 1 | 0.6% | |
| î | 1 | 0.6% | |
| û | 1 | 0.6% | |
| ı | 1 | 0.6% | |
| ğ | 1 | 0.6% | |
| ū | 1 | 0.6% | |
| ß | 1 | 0.6% | |
| Other values (7) | 7 | 4.2% |
Most frequent Cyrillic characters
| Value | Count | Frequency (%) | |
| и | 3 | 27.3% | |
| Ю | 1 | 9.1% | |
| л | 1 | 9.1% | |
| я | 1 | 9.1% | |
| С | 1 | 9.1% | |
| н | 1 | 9.1% | |
| г | 1 | 9.1% | |
| р | 1 | 9.1% | |
| ь | 1 | 9.1% |
Most frequent Latin Ext Additional characters
| Value | Count | Frequency (%) | |
| ỗ | 1 | 25.0% | |
| ị | 1 | 25.0% | |
| ả | 1 | 25.0% | |
| ế | 1 | 25.0% |
Most frequent Diacriticals characters
| Value | Count | Frequency (%) | |
| ́ | 1 | 100.0% |
| Distinct | 3373 |
|---|---|
| Distinct (%) | 70.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.6 KiB |
| UNK | 93 |
|---|---|
| Woody Harrelson | 10 |
| David Koechner | 8 |
| Vincent D'Onofrio | 8 |
| Alfred Molina | 8 |
| Other values (3368) |
| Value | Count | Frequency (%) | |
| UNK | 93 | 1.9% | |
| Woody Harrelson | 10 | 0.2% | |
| David Koechner | 8 | 0.2% | |
| Vincent D'Onofrio | 8 | 0.2% | |
| Alfred Molina | 8 | 0.2% | |
| Jim Broadbent | 8 | 0.2% | |
| Goran Visnjic | 7 | 0.1% | |
| Viola Davis | 7 | 0.1% | |
| Willem Dafoe | 7 | 0.1% | |
| Bill Murray | 7 | 0.1% | |
| Christopher Plummer | 7 | 0.1% | |
| James Doohan | 7 | 0.1% | |
| William H. Macy | 7 | 0.1% | |
| Robin Wright | 7 | 0.1% | |
| Judi Dench | 6 | 0.1% | |
| Sam Shepard | 6 | 0.1% | |
| Bill Paxton | 6 | 0.1% | |
| Amy Adams | 6 | 0.1% | |
| Stanley Tucci | 6 | 0.1% | |
| Gabrielle Union | 6 | 0.1% | |
| Ving Rhames | 6 | 0.1% | |
| Scott Glenn | 6 | 0.1% | |
| Maggie Gyllenhaal | 6 | 0.1% | |
| Steve Buscemi | 6 | 0.1% | |
| John C. Reilly | 6 | 0.1% | |
| Other values (3348) | 4546 | 94.6% |
Frequencies of value counts
Unique
| Unique | 2586 ? |
|---|---|
| Unique (%) | 53.8% |
Histogram of lengths of the category
Length
| Max length | 27 |
|---|---|
| Median length | 13 |
| Mean length | 12.98334374 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| e | 5732 | 9.2% | |
| a | 5635 | 9.0% | |
| 5065 | 8.1% | ||
| n | 4558 | 7.3% | |
| i | 4017 | 6.4% | |
| r | 3826 | 6.1% | |
| o | 3407 | 5.5% | |
| l | 3265 | 5.2% | |
| s | 2221 | 3.6% | |
| t | 2125 | 3.4% | |
| h | 1786 | 2.9% | |
| d | 1310 | 2.1% | |
| y | 1252 | 2.0% | |
| c | 1223 | 2.0% | |
| u | 1194 | 1.9% | |
| m | 1187 | 1.9% | |
| M | 884 | 1.4% | |
| J | 823 | 1.3% | |
| S | 776 | 1.2% | |
| C | 741 | 1.2% | |
| B | 733 | 1.2% | |
| g | 725 | 1.2% | |
| k | 660 | 1.1% | |
| D | 659 | 1.1% | |
| R | 618 | 1.0% | |
| Other values (71) | 7937 | 12.7% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Lowercase Letter | 46713 | 74.9% | |
| Uppercase Letter | 10290 | 16.5% | |
| Space Separator | 5065 | 8.1% | |
| Other Punctuation | 198 | 0.3% | |
| Dash Punctuation | 81 | 0.1% | |
| Other Letter | 10 | < 0.1% | |
| Decimal Number | 2 | < 0.1% |
Most frequent Uppercase Letter characters
| Value | Count | Frequency (%) | |
| M | 884 | 8.6% | |
| J | 823 | 8.0% | |
| S | 776 | 7.5% | |
| C | 741 | 7.2% | |
| B | 733 | 7.1% | |
| D | 659 | 6.4% | |
| R | 618 | 6.0% | |
| A | 574 | 5.6% | |
| K | 500 | 4.9% | |
| L | 493 | 4.8% | |
| H | 449 | 4.4% | |
| G | 441 | 4.3% | |
| P | 427 | 4.1% | |
| T | 374 | 3.6% | |
| W | 372 | 3.6% | |
| E | 310 | 3.0% | |
| N | 269 | 2.6% | |
| F | 238 | 2.3% | |
| V | 146 | 1.4% | |
| U | 127 | 1.2% | |
| O | 122 | 1.2% | |
| I | 86 | 0.8% | |
| Y | 46 | 0.4% | |
| Z | 45 | 0.4% | |
| Q | 18 | 0.2% | |
| Other values (7) | 19 | 0.2% |
Most frequent Lowercase Letter characters
| Value | Count | Frequency (%) | |
| e | 5732 | 12.3% | |
| a | 5635 | 12.1% | |
| n | 4558 | 9.8% | |
| i | 4017 | 8.6% | |
| r | 3826 | 8.2% | |
| o | 3407 | 7.3% | |
| l | 3265 | 7.0% | |
| s | 2221 | 4.8% | |
| t | 2125 | 4.5% | |
| h | 1786 | 3.8% | |
| d | 1310 | 2.8% | |
| y | 1252 | 2.7% | |
| c | 1223 | 2.6% | |
| u | 1194 | 2.6% | |
| m | 1187 | 2.5% | |
| g | 725 | 1.6% | |
| k | 660 | 1.4% | |
| b | 529 | 1.1% | |
| v | 461 | 1.0% | |
| p | 368 | 0.8% | |
| w | 355 | 0.8% | |
| f | 299 | 0.6% | |
| z | 248 | 0.5% | |
| x | 99 | 0.2% | |
| j | 68 | 0.1% | |
| Other values (24) | 163 | 0.3% |
Most frequent Space Separator characters
| Value | Count | Frequency (%) | |
| 5065 | 100.0% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 145 | 73.2% | |
| ' | 48 | 24.2% | |
| , | 3 | 1.5% | |
| " | 2 | 1.0% |
Most frequent Dash Punctuation characters
| Value | Count | Frequency (%) | |
| - | 81 | 100.0% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 4 | 1 | 50.0% | |
| 0 | 1 | 50.0% |
Most frequent Other Letter characters
| Value | Count | Frequency (%) | |
| ی | 2 | 20.0% | |
| م | 2 | 20.0% | |
| ا | 2 | 20.0% | |
| پ | 1 | 10.0% | |
| ن | 1 | 10.0% | |
| ع | 1 | 10.0% | |
| د | 1 | 10.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Latin | 57003 | 91.4% | |
| Common | 5346 | 8.6% | |
| Arabic | 10 | < 0.1% |
Most frequent Latin characters
| Value | Count | Frequency (%) | |
| e | 5732 | 10.1% | |
| a | 5635 | 9.9% | |
| n | 4558 | 8.0% | |
| i | 4017 | 7.0% | |
| r | 3826 | 6.7% | |
| o | 3407 | 6.0% | |
| l | 3265 | 5.7% | |
| s | 2221 | 3.9% | |
| t | 2125 | 3.7% | |
| h | 1786 | 3.1% | |
| d | 1310 | 2.3% | |
| y | 1252 | 2.2% | |
| c | 1223 | 2.1% | |
| u | 1194 | 2.1% | |
| m | 1187 | 2.1% | |
| M | 884 | 1.6% | |
| J | 823 | 1.4% | |
| S | 776 | 1.4% | |
| C | 741 | 1.3% | |
| B | 733 | 1.3% | |
| g | 725 | 1.3% | |
| k | 660 | 1.2% | |
| D | 659 | 1.2% | |
| R | 618 | 1.1% | |
| A | 574 | 1.0% | |
| Other values (56) | 7072 | 12.4% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| 5065 | 94.7% | ||
| . | 145 | 2.7% | |
| - | 81 | 1.5% | |
| ' | 48 | 0.9% | |
| , | 3 | 0.1% | |
| " | 2 | < 0.1% | |
| 4 | 1 | < 0.1% | |
| 0 | 1 | < 0.1% |
Most frequent Arabic characters
| Value | Count | Frequency (%) | |
| ی | 2 | 20.0% | |
| م | 2 | 20.0% | |
| ا | 2 | 20.0% | |
| پ | 1 | 10.0% | |
| ن | 1 | 10.0% | |
| ع | 1 | 10.0% | |
| د | 1 | 10.0% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 62209 | 99.8% | |
| None | 140 | 0.2% | |
| Arabic | 10 | < 0.1% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| e | 5732 | 9.2% | |
| a | 5635 | 9.1% | |
| 5065 | 8.1% | ||
| n | 4558 | 7.3% | |
| i | 4017 | 6.5% | |
| r | 3826 | 6.2% | |
| o | 3407 | 5.5% | |
| l | 3265 | 5.2% | |
| s | 2221 | 3.6% | |
| t | 2125 | 3.4% | |
| h | 1786 | 2.9% | |
| d | 1310 | 2.1% | |
| y | 1252 | 2.0% | |
| c | 1223 | 2.0% | |
| u | 1194 | 1.9% | |
| m | 1187 | 1.9% | |
| M | 884 | 1.4% | |
| J | 823 | 1.3% | |
| S | 776 | 1.2% | |
| C | 741 | 1.2% | |
| B | 733 | 1.2% | |
| g | 725 | 1.2% | |
| k | 660 | 1.1% | |
| D | 659 | 1.1% | |
| R | 618 | 1.0% | |
| Other values (35) | 7787 | 12.5% |
Most frequent None characters
| Value | Count | Frequency (%) | |
| é | 49 | 35.0% | |
| á | 13 | 9.3% | |
| í | 12 | 8.6% | |
| ë | 7 | 5.0% | |
| ñ | 7 | 5.0% | |
| å | 5 | 3.6% | |
| ó | 5 | 3.6% | |
| ö | 5 | 3.6% | |
| ç | 4 | 2.9% | |
| è | 4 | 2.9% | |
| Á | 3 | 2.1% | |
| ø | 3 | 2.1% | |
| Ó | 2 | 1.4% | |
| ō | 2 | 1.4% | |
| ä | 2 | 1.4% | |
| ń | 2 | 1.4% | |
| ü | 2 | 1.4% | |
| ć | 2 | 1.4% | |
| ș | 1 | 0.7% | |
| à | 1 | 0.7% | |
| Å | 1 | 0.7% | |
| ô | 1 | 0.7% | |
| ı | 1 | 0.7% | |
| É | 1 | 0.7% | |
| č | 1 | 0.7% | |
| Other values (4) | 4 | 2.9% |
Most frequent Arabic characters
| Value | Count | Frequency (%) | |
| ی | 2 | 20.0% | |
| م | 2 | 20.0% | |
| ا | 2 | 20.0% | |
| پ | 1 | 10.0% | |
| ن | 1 | 10.0% | |
| ع | 1 | 10.0% | |
| د | 1 | 10.0% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| Unnamed: 0 | budget | genres | homepage | id | plot_keywords | language | original_title | overview | popularity | production_companies | production_countries | release_date | gross | duration | spoken_languages | status | tagline | movie_title | vote_average | num_voted_users | title_year | country | director_name | actor_1_name | actor_2_name | actor_3_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 237000000 | Action|Adventure|Fantasy|Science Fiction | http://www.avatarmovie.com/ | 19995 | culture clash|future|space war|space colony|society|space travel|futuristic|romance|space|alien|tribe|alien planet|cgi|marine|soldier|battle|love affair|anti war|power relations|mind and soul|3d | English | Avatar | In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. | 150.437577 | [{'name': 'Ingenious Film Partners', 'id': 289}, {'name': 'Twentieth Century Fox Film Corporation', 'id': 306}, {'name': 'Dune Entertainment', 'id': 444}, {'name': 'Lightstorm Entertainment', 'id': 574}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'GB', 'name': 'United Kingdom'}] | 2009-12-10 | 2787965087 | 162 | [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}] | Released | Enter the World of Pandora. | Avatar | 7.2 | 11800 | 2009 | United States of America | James Cameron | Zoe Saldana | Sigourney Weaver | Stephen Lang |
| 1 | 1 | 300000000 | Adventure|Fantasy|Action | http://disney.go.com/disneypictures/pirates/ | 285 | ocean|drug abuse|exotic island|east india trading company|love of one's life|traitor|shipwreck|strong woman|ship|alliance|calypso|afterlife|fighter|pirate|swashbuckler|aftercreditsstinger | English | Pirates of the Caribbean: At World's End | Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems. | 139.082615 | [{'name': 'Walt Disney Pictures', 'id': 2}, {'name': 'Jerry Bruckheimer Films', 'id': 130}, {'name': 'Second Mate Productions', 'id': 19936}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2007-05-19 | 961000000 | 169 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | At the end of the world, the adventure begins. | Pirates of the Caribbean: At World's End | 6.9 | 4500 | 2007 | United States of America | Gore Verbinski | Orlando Bloom | Keira Knightley | Stellan Skarsgård |
| 2 | 2 | 245000000 | Action|Adventure|Crime | http://www.sonypictures.com/movies/spectre/ | 206647 | spy|based on novel|secret agent|sequel|mi6|british secret service|united kingdom | Français | Spectre | A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE. | 107.376788 | [{'name': 'Columbia Pictures', 'id': 5}, {'name': 'Danjaq', 'id': 10761}, {'name': 'B24', 'id': 69434}] | [{'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 2015-10-26 | 880674609 | 148 | [{'iso_639_1': 'fr', 'name': 'Français'}, {'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'es', 'name': 'Español'}, {'iso_639_1': 'it', 'name': 'Italiano'}, {'iso_639_1': 'de', 'name': 'Deutsch'}] | Released | A Plan No One Escapes | Spectre | 6.3 | 4466 | 2015 | United Kingdom | Sam Mendes | Christoph Waltz | Léa Seydoux | Ralph Fiennes |
| 3 | 3 | 250000000 | Action|Crime|Drama|Thriller | http://www.thedarkknightrises.com/ | 49026 | dc comics|crime fighter|terrorist|secret identity|burglar|hostage drama|time bomb|gotham city|vigilante|cover-up|superhero|villainess|tragic hero|terrorism|destruction|catwoman|cat burglar|imax|flood|criminal underworld|batman | English | The Dark Knight Rises | Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy. | 112.312950 | [{'name': 'Legendary Pictures', 'id': 923}, {'name': 'Warner Bros.', 'id': 6194}, {'name': 'DC Entertainment', 'id': 9993}, {'name': 'Syncopy', 'id': 9996}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2012-07-16 | 1084939099 | 165 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | The Legend Ends | The Dark Knight Rises | 7.6 | 9106 | 2012 | United States of America | Christopher Nolan | Michael Caine | Gary Oldman | Anne Hathaway |
| 4 | 4 | 260000000 | Action|Adventure|Science Fiction | http://movies.disney.com/john-carter | 49529 | based on novel|mars|medallion|space travel|princess|alien|steampunk|martian|escape|edgar rice burroughs|alien race|superhuman strength|mars civilization|sword and planet|19th century|3d | English | John Carter | John Carter is a war-weary, former military captain who's inexplicably transported to the mysterious and exotic planet of Barsoom (Mars) and reluctantly becomes embroiled in an epic conflict. It's a world on the brink of collapse, and Carter rediscovers his humanity when he realizes the survival of Barsoom and its people rests in his hands. | 43.926995 | [{'name': 'Walt Disney Pictures', 'id': 2}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2012-03-07 | 284139100 | 132 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | Lost in our world, found in another. | John Carter | 6.1 | 2124 | 2012 | United States of America | Andrew Stanton | Lynn Collins | Samantha Morton | Willem Dafoe |
| 5 | 5 | 258000000 | Fantasy|Action|Adventure | http://www.sonypictures.com/movies/spider-man3/ | 559 | dual identity|amnesia|sandstorm|love of one's life|forgiveness|spider|wretch|death of a friend|egomania|sand|narcism|hostility|marvel comic|sequel|superhero|revenge | English | Spider-Man 3 | The seemingly invincible Spider-Man goes up against an all-new crop of villain – including the shape-shifting Sandman. While Spider-Man’s superpowers are altered by an alien organism, his alter ego, Peter Parker, deals with nemesis Eddie Brock and also gets caught up in a love triangle. | 115.699814 | [{'name': 'Columbia Pictures', 'id': 5}, {'name': 'Laura Ziskin Productions', 'id': 326}, {'name': 'Marvel Enterprises', 'id': 19551}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2007-05-01 | 890871626 | 139 | [{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}] | Released | The battle within. | Spider-Man 3 | 5.9 | 3576 | 2007 | United States of America | Sam Raimi | Kirsten Dunst | James Franco | Thomas Haden Church |
| 6 | 6 | 260000000 | Animation|Family | http://disney.go.com/disneypictures/tangled/ | 38757 | hostage|magic|horse|fairy tale|musical|princess|animation|tower|blonde woman|selfishness|healing power|based on fairy tale|duringcreditsstinger|healing gift|animal sidekick | English | Tangled | When the kingdom's most wanted-and most charming-bandit Flynn Rider hides out in a mysterious tower, he's taken hostage by Rapunzel, a beautiful and feisty tower-bound teen with 70 feet of magical, golden hair. Flynn's curious captor, who's looking for her ticket out of the tower where she's been locked away for years, strikes a deal with the handsome thief and the unlikely duo sets off on an action-packed escapade, complete with a super-cop horse, an over-protective chameleon and a gruff gang of pub thugs. | 48.681969 | [{'name': 'Walt Disney Pictures', 'id': 2}, {'name': 'Walt Disney Animation Studios', 'id': 6125}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2010-11-24 | 591794936 | 100 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | They're taking adventure to new lengths. | Tangled | 7.4 | 3330 | 2010 | United States of America | Byron Howard | Mandy Moore | Donna Murphy | Ron Perlman |
| 7 | 7 | 280000000 | Action|Adventure|Science Fiction | http://marvel.com/movies/movie/193/avengers_age_of_ultron | 99861 | marvel comic|sequel|superhero|based on comic book|vision|superhero team|duringcreditsstinger|marvel cinematic universe|3d | English | Avengers: Age of Ultron | When Tony Stark tries to jumpstart a dormant peacekeeping program, things go awry and Earth’s Mightiest Heroes are put to the ultimate test as the fate of the planet hangs in the balance. As the villainous Ultron emerges, it is up to The Avengers to stop him from enacting his terrible plans, and soon uneasy alliances and unexpected action pave the way for an epic and unique global adventure. | 134.279229 | [{'name': 'Marvel Studios', 'id': 420}, {'name': 'Prime Focus', 'id': 15357}, {'name': 'Revolution Sun Studios', 'id': 76043}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2015-04-22 | 1405403694 | 141 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | A New Age Has Come. | Avengers: Age of Ultron | 7.3 | 6767 | 2015 | United States of America | Joss Whedon | Chris Hemsworth | Mark Ruffalo | Chris Evans |
| 8 | 8 | 250000000 | Adventure|Fantasy|Family | http://harrypotter.warnerbros.com/harrypotterandthehalf-bloodprince/dvd/index.html | 767 | witch|magic|broom|school of witchcraft|wizardry|apparition|teenage crush|werewolf | English | Harry Potter and the Half-Blood Prince | As Harry begins his sixth year at Hogwarts, he discovers an old book marked as 'Property of the Half-Blood Prince', and begins to learn more about Lord Voldemort's dark past. | 98.885637 | [{'name': 'Warner Bros.', 'id': 6194}, {'name': 'Heyday Films', 'id': 7364}] | [{'iso_3166_1': 'GB', 'name': 'United Kingdom'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 2009-07-07 | 933959197 | 153 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | Dark Secrets Revealed | Harry Potter and the Half-Blood Prince | 7.4 | 5293 | 2009 | United Kingdom | David Yates | Rupert Grint | Emma Watson | Tom Felton |
| 9 | 9 | 250000000 | Action|Adventure|Fantasy | http://www.batmanvsupermandawnofjustice.com/ | 209112 | dc comics|vigilante|superhero|based on comic book|revenge|super powers|clark kent|bruce wayne|dc extended universe | English | Batman v Superman: Dawn of Justice | Fearing the actions of a god-like Super Hero left unchecked, Gotham City’s own formidable, forceful vigilante takes on Metropolis’s most revered, modern-day savior, while the world wrestles with what sort of hero it really needs. And with Batman and Superman at war with one another, a new threat quickly arises, putting mankind in greater danger than it’s ever known before. | 155.790452 | [{'name': 'DC Comics', 'id': 429}, {'name': 'Atlas Entertainment', 'id': 507}, {'name': 'Warner Bros.', 'id': 6194}, {'name': 'DC Entertainment', 'id': 9993}, {'name': 'Cruel & Unusual Films', 'id': 9995}, {'name': 'RatPac-Dune Entertainment', 'id': 41624}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2016-03-23 | 873260194 | 151 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | Justice or revenge | Batman v Superman: Dawn of Justice | 5.7 | 7004 | 2016 | United States of America | Zack Snyder | Henry Cavill | Gal Gadot | Amy Adams |
Last rows
| Unnamed: 0 | budget | genres | homepage | id | plot_keywords | language | original_title | overview | popularity | production_companies | production_countries | release_date | gross | duration | spoken_languages | status | tagline | movie_title | vote_average | num_voted_users | title_year | country | director_name | actor_1_name | actor_2_name | actor_3_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4793 | 4793 | 0 | Drama | UNK | 182291 | confession|hazing|gang member|latino|lgbt|catholic priest|shakespeare's romeo and juliet|latino lgbt|gang initiation|gunplay | UNK | On The Downlow | Isaac and Angel are two young Latinos involved in a south side Chicago gang. They have a secret in a world where secrets are forbidden. | 0.029757 | [{'name': 'Iconoclast Films', 'id': 26677}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2004-04-11 | 0 | 90 | [] | Released | Two gangs. One secret. One crossroad. | On The Downlow | 6.0 | 2 | 2004 | United States of America | Tadeo Garcia | Michael Cortez | Donato Cruz | Felipe Camacho |
| 4794 | 4794 | 0 | Thriller|Horror|Comedy | UNK | 286939 | UNK | English | Sanctuary: Quite a Conundrum | It should have been just a normal day of sex, fun, alcohol, hormones and debauchery for Tabitha and Mimi, two over-privileged twenty-somethings. But that so-called normalcy gets tossed out the window when a devastating event occurs at a pool party. | 0.166513 | [{'name': 'Gold Lion Films', 'id': 37870}, {'name': 'T-Street Productions', 'id': 37871}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2012-01-20 | 0 | 82 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | UNK | Sanctuary: Quite a Conundrum | 0.0 | 0 | 2012 | United States of America | Thomas L. Phillips | Erin Cline | Emily Rogers | Anthony Rutowicz |
| 4795 | 4795 | 0 | Drama | UNK | 124606 | gang|audition|police fake|homeless|actress | English | Bang | A young woman in L.A. is having a bad day: she's evicted, an audition ends with a producer furious she won't trade sex for the part, and a policeman nabs her for something she didn't do, demanding fellatio to release her. She snaps, grabs his gun, takes his uniform, and leaves him cuffed to a tree where he's soon having a defenseless chat with a homeless man. She takes off on the cop's motorcycle and, for an afternoon, experiences a cop's life. She talks a young man out of suicide and then is plunged into violence after a friendly encounter with two "vatos." She is torn between self-protection and others' expectations. Is there any resolution for her torrent of feelings? | 0.918116 | [{'name': 'Asylum Films', 'id': 10571}, {'name': 'FM Entertainment', 'id': 26598}, {'name': 'Eagle Eye Films Inc.', 'id': 40739}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 1995-09-09 | 0 | 98 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | Sometimes you've got to break the rules | Bang | 6.0 | 1 | 1995 | United States of America | Ash Baron-Cohen | Peter Greene | Michael Newland | Erik Schrody |
| 4796 | 4796 | 7000 | Science Fiction|Drama|Thriller | http://www.primermovie.com | 14337 | distrust|garage|identity crisis|time travel|time machine|mathematics|independent film|paradox|mechanical engineering | English | Primer | Friends/fledgling entrepreneurs invent a device in their garage that reduces the apparent mass of any object placed inside it, but they accidentally discover that it has some highly unexpected capabilities -- ones that could enable them to do and to have seemingly anything they want. Taking advantage of this unique opportunity is the first challenge they face. Dealing with the consequences is the next. | 23.307949 | [{'name': 'Thinkfilm', 'id': 446}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2004-10-08 | 424760 | 77 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | What happens if it actually works? | Primer | 6.9 | 658 | 2004 | United States of America | Shane Carruth | David Sullivan | Casey Gooden | Anand Upadhyaya |
| 4797 | 4797 | 0 | Foreign|Thriller | UNK | 67238 | UNK | UNK | Cavite | Adam, a security guard, travels from California to the Philippines, his native land, for his father's funeral. He arrives in Manila. As he waits, a phone rings in his backpack; he answers it, and a male voice tells him that his mother and sister are captives and will be killed if Adam doesn't cooperate. Over the next hour, the voice sends Adam by bus, taxi, motorized tricycle, and on foot through an urban landscape of busy streets, cramped apartments, a fetid squatters' camp, a bank, a cockfighting arena, and a church. Adam's conversations with the voice cover murder, Islam, jihad, rebellion in Mindanao, and his family. What is it Adam will be commanded to do? | 0.022173 | [] | [] | 2005-03-12 | 0 | 80 | [] | Released | UNK | Cavite | 7.5 | 2 | 2005 | UNK | Neill Dela Llana | UNK | UNK | UNK |
| 4798 | 4798 | 220000 | Action|Crime|Thriller | UNK | 9367 | united states–mexico barrier|legs|arms|paper knife|guitar case | Español | El Mariachi | El Mariachi just wants to play his guitar and carry on the family tradition. Unfortunately, the town he tries to find work in has another visitor...a killer who carries his guns in a guitar case. The drug lord and his henchmen mistake El Mariachi for the killer, Azul, and chase him around town trying to kill him and get his guitar case. | 14.269792 | [{'name': 'Columbia Pictures', 'id': 5}] | [{'iso_3166_1': 'MX', 'name': 'Mexico'}, {'iso_3166_1': 'US', 'name': 'United States of America'}] | 1992-09-04 | 2040920 | 81 | [{'iso_639_1': 'es', 'name': 'Español'}] | Released | He didn't come looking for trouble, but trouble came looking for him. | El Mariachi | 6.6 | 238 | 1992 | Mexico | Robert Rodriguez | Jaime de Hoyos | Peter Marquardt | Reinol Martinez |
| 4799 | 4799 | 9000 | Comedy|Romance | UNK | 72766 | UNK | UNK | Newlyweds | A newlywed couple's honeymoon is upended by the arrivals of their respective sisters. | 0.642552 | [] | [] | 2011-12-26 | 0 | 85 | [] | Released | A newlywed couple's honeymoon is upended by the arrivals of their respective sisters. | Newlyweds | 5.9 | 5 | 2011 | UNK | Edward Burns | Kerry Bishé | Marsha Dietlein | Caitlin Fitzgerald |
| 4800 | 4800 | 0 | Comedy|Drama|Romance|TV Movie | http://www.hallmarkchannel.com/signedsealeddelivered | 231617 | date|love at first sight|narration|investigation|team|postal worker | English | Signed, Sealed, Delivered | "Signed, Sealed, Delivered" introduces a dedicated quartet of civil servants in the Dead Letter Office of the U.S. Postal System who transform themselves into an elite team of lost-mail detectives. Their determination to deliver the seemingly undeliverable takes them out of the post office into an unpredictable world where letters and packages from the past save lives, solve crimes, reunite old loves, and change futures by arriving late, but always miraculously on time. | 1.444476 | [{'name': 'Front Street Pictures', 'id': 3958}, {'name': 'Muse Entertainment Enterprises', 'id': 6438}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2013-10-13 | 0 | 120 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | UNK | Signed, Sealed, Delivered | 7.0 | 6 | 2013 | United States of America | Scott Smith | Kristin Booth | Crystal Lowe | Geoff Gustafson |
| 4801 | 4801 | 0 | UNK | http://shanghaicalling.com/ | 126186 | UNK | English | Shanghai Calling | When ambitious New York attorney Sam is sent to Shanghai on assignment, he immediately stumbles into a legal mess that could end his career. With the help of a beautiful relocation specialist, a well-connected old-timer, a clever journalist, and a street-smart legal assistant, Sam might just save his job, find romance, and learn to appreciate the beauty and wonders of Shanghai. Written by Anonymous (IMDB.com). | 0.857008 | [] | [{'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'CN', 'name': 'China'}] | 2012-05-03 | 0 | 98 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | A New Yorker in Shanghai | Shanghai Calling | 5.7 | 7 | 2012 | United States of America | Daniel Hsia | Eliza Coupe | Bill Paxton | Alan Ruck |
| 4802 | 4802 | 0 | Documentary | UNK | 25975 | obsession|camcorder|crush|dream girl | English | My Date with Drew | Ever since the second grade when he first saw her in E.T. The Extraterrestrial, Brian Herzlinger has had a crush on Drew Barrymore. Now, 20 years later he's decided to try to fulfill his lifelong dream by asking her for a date. There's one small problem: She's Drew Barrymore and he's, well, Brian Herzlinger, a broke 27-year-old aspiring filmmaker from New Jersey. | 1.929883 | [{'name': 'rusty bear entertainment', 'id': 87986}, {'name': 'lucky crow films', 'id': 87987}] | [{'iso_3166_1': 'US', 'name': 'United States of America'}] | 2005-08-05 | 0 | 90 | [{'iso_639_1': 'en', 'name': 'English'}] | Released | UNK | My Date with Drew | 6.3 | 16 | 2005 | United States of America | Brian Herzlinger | Brian Herzlinger | Corey Feldman | Eric Roberts |